[llvm-branch-commits] [llvm-objdump] -r: support CREL (PR #97382)

2024-07-02 Thread James Henderson via llvm-branch-commits


@@ -1022,6 +1033,47 @@ ELFObjectFile::section_rel_begin(DataRefImpl Sec) 
const {
   uintptr_t SHT = reinterpret_cast((*SectionsOrErr).begin());
   RelData.d.a = (Sec.p - SHT) / EF.getHeader().e_shentsize;
   RelData.d.b = 0;
+  if (reinterpret_cast(Sec.p)->sh_type == ELF::SHT_CREL) {
+if (RelData.d.a + 1 > Crels.size()) {
+  Crels.resize(RelData.d.a + 1);
+  CrelErrs.resize(RelData.d.a + 1);
+}
+if (Crels[RelData.d.a].empty()) {
+  // Decode SHT_CREL. See ELFFile::decodeCrel.
+  ArrayRef Content = cantFail(getSectionContents(Sec));
+  DataExtractor Data(Content, ELFT::Endianness == endianness::little,
+ sizeof(typename ELFT::Addr));
+  DataExtractor::Cursor Cur(0);
+  const uint64_t Hdr = Data.getULEB128(Cur);
+  const size_t Count = Hdr / 8;
+  const size_t FlagBits = Hdr & ELF::CREL_HDR_ADDEND ? 3 : 2;
+  const size_t Shift = Hdr % ELF::CREL_HDR_ADDEND;
+  uintX_t Offset = 0, Addend = 0;
+  uint32_t Symidx = 0, Type = 0;
+  for (size_t i = 0; i != Count; ++i) {

jh7370 wrote:

It really doesn't feel right that this whole algorithm is duplicated from the 
version in ELF.cpp. Surely they can be folded together into shared code 
somewhere?

https://github.com/llvm/llvm-project/pull/97382
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [clang][test] add testing for the AST matcher reference (PR #94248)

2024-07-02 Thread Julian Schmidt via llvm-branch-commits

5chmidti wrote:

Thanks.

> ... how this impacts build times for Clang itself? I'm assuming that if 
> ASTMatchers.h isn't modified, CMake won't re-run 
> generate_ast_matcher_doc_tests.py and so the compile time performance hit is 
> only on full rebuilds or when changing the header?

The 'state' of the generated file is only checked when the `ASTMatchersTests` 
target is built, because it's the only thing that depends on the generated 
file. And the file is only generated when: the file does not exist or 
`ASTMatchers.h` has changed (excluding transitive changes in includes) or 
`generate_ast_matcher_doc_tests.py` has changed.

https://github.com/llvm/llvm-project/pull/94248
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm-objdump] -r: support CREL (PR #97382)

2024-07-02 Thread Peter Smith via llvm-branch-commits


@@ -2687,6 +2687,15 @@ void Dumper::printRelocations() {
<< "VALUE\n";
 
 for (SectionRef Section : P.second) {
+  // CREL requires decoding and has its specific errors.
+  if (O.isELF() && ELFSectionRef(Section).getType() == ELF::SHT_CREL) {
+const ELFObjectFileBase *ELF = cast(&O);
+StringRef Err = ELF->getCrelError(Section);
+if (!Err.empty()) {
+  reportUniqueWarning(Err);

smithp35 wrote:

If we are giving a warning rather than an error, then perhaps we should use 
something like `getCrelDecodeProblem`

Just looks a bit strange recording errors, but then only reporting a warning.

My expectation for a decode error would be that we report it, but continue as 
best we can. I would expect a non-zero error code though. Please ignore if this 
isn't llvm-objdump practice. 

https://github.com/llvm/llvm-project/pull/97382
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm-objdump] -r: support CREL (PR #97382)

2024-07-02 Thread Peter Smith via llvm-branch-commits


@@ -292,6 +295,11 @@ template  class ELFObjectFile : public 
ELFObjectFileBase {
   const Elf_Shdr *DotSymtabSec = nullptr; // Symbol table section.
   const Elf_Shdr *DotSymtabShndxSec = nullptr; // SHT_SYMTAB_SHNDX section.
 
+  // Hold CREL relocations for SectionRef::relocations().
+  mutable SmallVector, 0> Crels;
+  // Hold CREL decoding errors.
+  mutable SmallVector CrelErrs;

smithp35 wrote:

I expect that most objects would have no decoding errors, yet it looks like 
we're allocating a potentially large (intermediate ELF file as output of LTO 
for example) amount of empty strings.

An alternative could be an unordered_map of section indexes to strings. This 
would also look a bit cleaner than just checking if the string is empty for no 
errors.

I don't have a strong opinion though.

https://github.com/llvm/llvm-project/pull/97382
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm-objdump] -r: support CREL (PR #97382)

2024-07-02 Thread Peter Smith via llvm-branch-commits


@@ -1453,6 +1525,15 @@ template  bool 
ELFObjectFile::isRelocatableObject() const {
   return EF.getHeader().e_type == ELF::ET_REL;
 }
 
+template 
+StringRef ELFObjectFile::getCrelError(DataRefImpl Sec) const {
+  uintptr_t SHT = reinterpret_cast(cantFail(EF.sections()).begin());
+  auto I = (Sec.p - SHT) / EF.getHeader().e_shentsize;
+  if (I < CrelErrs.size())
+return CrelErrs[I];
+  return "";

smithp35 wrote:

Wouldn't this be an error (or perhaps internal error) for an out of bounds 
access?

https://github.com/llvm/llvm-project/pull/97382
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm-objdump] -r: support CREL (PR #97382)

2024-07-02 Thread Peter Smith via llvm-branch-commits

https://github.com/smithp35 edited 
https://github.com/llvm/llvm-project/pull/97382
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm-objdump] -r: support CREL (PR #97382)

2024-07-02 Thread Peter Smith via llvm-branch-commits


@@ -2687,6 +2687,15 @@ void Dumper::printRelocations() {
<< "VALUE\n";
 
 for (SectionRef Section : P.second) {
+  // CREL requires decoding and has its specific errors.

smithp35 wrote:

I recommend
```
// CREL sections require decoding, each section may have its own specific 
errors.
```

https://github.com/llvm/llvm-project/pull/97382
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm-objdump] -r: support CREL (PR #97382)

2024-07-02 Thread Peter Smith via llvm-branch-commits

https://github.com/smithp35 commented:

I agree with jh7370 that it would be good to extract the decoding logic, it 
seems like each tool needs some way of passing in a way to step through the 
data, return the decoded data, and a way of storing errors. It is possible that 
this will end up being more code than just inlining the algorithm, however I'm 
hoping that this additional code is simpler and less error prone than copying 
the logic.

May be worth a try to see what it looks like.

https://github.com/llvm/llvm-project/pull/97382
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Enable atomic optimizer for divergent i64 and double values (PR #96934)

2024-07-02 Thread Vikram Hegde via llvm-branch-commits


@@ -178,6 +178,21 @@ bool AMDGPUAtomicOptimizerImpl::run(Function &F) {
   return Changed;
 }
 
+static bool isOptimizableAtomic(Type *Ty) {
+  switch (Ty->getTypeID()) {
+  case Type::FloatTyID:
+  case Type::DoubleTyID:
+return true;
+  case Type::IntegerTyID: {
+unsigned size = Ty->getIntegerBitWidth();

vikramRH wrote:

done

https://github.com/llvm/llvm-project/pull/96934
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Enable atomic optimizer for divergent i64 and double values (PR #96934)

2024-07-02 Thread Vikram Hegde via llvm-branch-commits


@@ -230,8 +245,7 @@ void 
AMDGPUAtomicOptimizerImpl::visitAtomicRMWInst(AtomicRMWInst &I) {
   // value to the atomic calculation. We can only optimize divergent values if
   // we have DPP available on our subtarget, and the atomic operation is 32
   // bits.
-  if (ValDivergent &&
-  (!ST->hasDPP() || DL->getTypeSizeInBits(I.getType()) != 32)) {
+  if (ValDivergent && (!ST->hasDPP() || !isOptimizableAtomic(I.getType( {

vikramRH wrote:

updated

https://github.com/llvm/llvm-project/pull/96934
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Enable atomic optimizer for divergent i64 and double values (PR #96934)

2024-07-02 Thread Vikram Hegde via llvm-branch-commits


@@ -178,6 +178,21 @@ bool AMDGPUAtomicOptimizerImpl::run(Function &F) {
   return Changed;
 }
 
+static bool isOptimizableAtomic(Type *Ty) {

vikramRH wrote:

updated, thanks

https://github.com/llvm/llvm-project/pull/96934
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Enable atomic optimizer for divergent i64 and double values (PR #96934)

2024-07-02 Thread Vikram Hegde via llvm-branch-commits

vikramRH wrote:

> > [AMDGPU] Enable atomic optimizer for divergent i64 and double values
> 
> Needs some i64 tests

added new i64 tests, however I see there currently exists an issue with DPP 
path where dpp combine partially fuses the mov_dpp pieces causing machine CSE 
crash. I have proposed https://github.com/llvm/llvm-project/pull/97413 for now. 
what would be the correct way forward here ?

https://github.com/llvm/llvm-project/pull/96934
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Enable atomic optimizer for divergent i64 and double values (PR #96934)

2024-07-02 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> > > [AMDGPU] Enable atomic optimizer for divergent i64 and double values
> > 
> > 
> > Needs some i64 tests
> 
> added new i64 tests, however I see there currently exists an issue with DPP 
> path where dpp combine partially fuses the mov_dpp pieces causing machine CSE 
> crash. I have proposed #97413 for now. what would be the correct way forward 
> here ?

You didn't include a (very necessary) test in #97413, but DPP instructions 
shouldn't be candidates for trivial CSE in the first place? 

https://github.com/llvm/llvm-project/pull/96934
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Enable atomic optimizer for divergent i64 and double values (PR #96934)

2024-07-02 Thread Vikram Hegde via llvm-branch-commits

vikramRH wrote:

> > > > [AMDGPU] Enable atomic optimizer for divergent i64 and double values
> > > 
> > > 
> > > Needs some i64 tests
> > 
> > 
> > added new i64 tests, however I see there currently exists an issue with DPP 
> > path where dpp combine partially fuses the mov_dpp pieces causing machine 
> > CSE crash. I have proposed #97413 for now. what would be the correct way 
> > forward here ?
> 
> You didn't include a (very necessary) test in #97413, but DPP instructions 
> shouldn't be candidates for trivial CSE in the first place?

sorry about that, just wanted to bring this up first (I will update the PR with 
a test). The issue is not with DPP instructions themselves but with the 
REG_SEQUENCE which is generated after fusing the 32 bit pieces.

https://github.com/llvm/llvm-project/pull/96934
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [RISCV][lld] Support merging RISC-V Atomics ABI attributes (PR #97347)

2024-07-02 Thread Paul Kirth via llvm-branch-commits

https://github.com/ilovepi updated 
https://github.com/llvm/llvm-project/pull/97347


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [RISCV][lld] Support merging RISC-V Atomics ABI attributes (PR #97347)

2024-07-02 Thread Paul Kirth via llvm-branch-commits

https://github.com/ilovepi updated 
https://github.com/llvm/llvm-project/pull/97347


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] [llvm] Reapply "[llvm][RISCV] Enable trailing fences for seq-cst stores by default (#87376)" (PR #90267)

2024-07-02 Thread Paul Kirth via llvm-branch-commits

https://github.com/ilovepi updated 
https://github.com/llvm/llvm-project/pull/90267

>From 788f58b302f20a000db3a9741e54cc861c9dcb88 Mon Sep 17 00:00:00 2001
From: Paul Kirth 
Date: Mon, 1 Jul 2024 10:13:23 -0700
Subject: [PATCH] Fix mismerge

Created using spr 1.3.4
---
 lld/ELF/Arch/RISCV.cpp| 70 ---
 lld/test/ELF/riscv-attributes.s   |  4 +-
 llvm/include/llvm/Support/RISCVAttributes.h   |  4 +-
 .../MCTargetDesc/RISCVTargetStreamer.cpp  | 15 ++--
 4 files changed, 39 insertions(+), 54 deletions(-)

diff --git a/lld/ELF/Arch/RISCV.cpp b/lld/ELF/Arch/RISCV.cpp
index 1d9be92601006..0d0ab5de9acc1 100644
--- a/lld/ELF/Arch/RISCV.cpp
+++ b/lld/ELF/Arch/RISCV.cpp
@@ -1088,65 +1088,51 @@ static void mergeAtomic(DenseMap::iterator it,
 const InputSectionBase *oldSection,
 const InputSectionBase *newSection, unsigned int 
oldTag,
 unsigned int newTag) {
-  using RISCVAttrs::RISCVAtomicAbiTag;
+  using RISCVAttrs::RISCVAtomicAbiTag::AtomicABI;
   // Same tags stay the same, and UNKNOWN is compatible with anything
-  if (oldTag == newTag ||
-  newTag == static_cast(RISCVAtomicAbiTag::UNKNOWN))
+  if (oldTag == newTag || newTag == AtomicABI::UNKNOWN)
 return;
 
-  auto reportAbiError = [&]() {
-errorOrWarn("atomic abi mismatch for " + oldSection->name + "\n>>> " +
-toString(oldSection) + ": atomic_abi=" + Twine(oldTag) +
-"\n>>> " + toString(newSection) +
-": atomic_abi=" + Twine(newTag));
-  };
-
-  switch (static_cast(oldTag)) {
-  case RISCVAtomicAbiTag::UNKNOWN:
+  switch (oldTag) {
+  case AtomicABI::UNKNOWN:
 it->getSecond() = newTag;
 return;
-  case RISCVAtomicAbiTag::A6C:
-switch (static_cast(newTag)) {
-case RISCVAtomicAbiTag::A6S:
-  it->getSecond() = static_cast(RISCVAtomicAbiTag::A6C);
+  case AtomicABI::A6C:
+switch (newTag) {
+case AtomicABI::A6S:
+  it->getSecond() = AtomicABI::A6C;
   return;
-case RISCVAtomicAbiTag::A7:
-  reportAbiError();
+case AtomicABI::A7:
+  errorOrWarn(toString(oldSection) + " has atomic_abi=" + Twine(oldTag) +
+  " but " + toString(newSection) +
+  " has atomic_abi=" + Twine(newTag));
   return;
-case RISCVAttrs::RISCVAtomicAbiTag::UNKNOWN:
-case RISCVAttrs::RISCVAtomicAbiTag::A6C:
-  break;
 };
-break;
 
-  case RISCVAtomicAbiTag::A6S:
-switch (static_cast(newTag)) {
-case RISCVAtomicAbiTag::A6C:
-  it->getSecond() = static_cast(RISCVAtomicAbiTag::A6C);
+  case AtomicABI::A6S:
+switch (newTag) {
+case AtomicABI::A6C:
+  it->getSecond() = AtomicABI::A6C;
   return;
-case RISCVAtomicAbiTag::A7:
-  it->getSecond() = static_cast(RISCVAtomicAbiTag::A7);
+case AtomicABI::A7:
+  it->getSecond() = AtomicABI::A7;
   return;
-case RISCVAttrs::RISCVAtomicAbiTag::UNKNOWN:
-case RISCVAttrs::RISCVAtomicAbiTag::A6S:
-  break;
 };
-break;
 
-  case RISCVAtomicAbiTag::A7:
-switch (static_cast(newTag)) {
-case RISCVAtomicAbiTag::A6S:
-  it->getSecond() = static_cast(RISCVAtomicAbiTag::A7);
+  case AtomicABI::A7:
+switch (newTag) {
+case AtomicABI::A6S:
+  it->getSecond() = AtomicABI::A7;
   return;
-case RISCVAtomicAbiTag::A6C:
-  reportAbiError();
+case AtomicABI::A6C:
+  errorOrWarn(toString(oldSection) + " has atomic_abi=" + Twine(oldTag) +
+  " but " + toString(newSection) +
+  " has atomic_abi=" + Twine(newTag));
   return;
-case RISCVAttrs::RISCVAtomicAbiTag::UNKNOWN:
-case RISCVAttrs::RISCVAtomicAbiTag::A7:
-  break;
 };
+  default:
+llvm_unreachable("unknown AtomicABI");
   };
-  llvm_unreachable("unknown AtomicABI");
 }
 
 static RISCVAttributesSection *
diff --git a/lld/test/ELF/riscv-attributes.s b/lld/test/ELF/riscv-attributes.s
index 38b0fe8e7797e..91321f33297aa 100644
--- a/lld/test/ELF/riscv-attributes.s
+++ b/lld/test/ELF/riscv-attributes.s
@@ -48,9 +48,7 @@
 # RUN: llvm-mc -filetype=obj -triple=riscv64  atomic_abi_A6C.s -o 
atomic_abi_A6C.o
 # RUN: llvm-mc -filetype=obj -triple=riscv64  atomic_abi_A7.s -o 
atomic_abi_A7.o
 # RUN: not ld.lld atomic_abi_A6C.o atomic_abi_A7.o -o /dev/null 2>&1 | 
FileCheck %s --check-prefix=ATOMIC_ABI_ERROR --implicit-check-not=error:
-# ATOMIC_ABI_ERROR: error: atomic abi mismatch for .riscv.attributes
-# ATOMIC_ABI_ERROR-NEXT: >>> atomic_abi_A6C.o:(.riscv.attributes): atomic_abi=1
-# ATOMIC_ABI_ERROR-NEXT: >>> atomic_abi_A7.o:(.riscv.attributes): atomic_abi=3
+# ATOMIC_ABI_ERROR: error: atomic_abi_A6C.o:(.riscv.attributes) has 
atomic_abi=1 but atomic_abi_A7.o:(.riscv.attributes) has atomic_abi=3
 
 # RUN: llvm-mc -filetype=obj -triple=riscv64  atomic_abi_A6S.s -o 
atomic_abi_A6S.o
 # RUN: ld.lld atomic_abi_A6S.o atomic_abi_A6C.o -o atomic_abi_A6C_A6S
diff --git a/llvm/include/llvm/Sup

[llvm-branch-commits] [lld] [llvm] Reapply "[llvm][RISCV] Enable trailing fences for seq-cst stores by default (#87376)" (PR #90267)

2024-07-02 Thread Paul Kirth via llvm-branch-commits

https://github.com/ilovepi updated 
https://github.com/llvm/llvm-project/pull/90267

>From 788f58b302f20a000db3a9741e54cc861c9dcb88 Mon Sep 17 00:00:00 2001
From: Paul Kirth 
Date: Mon, 1 Jul 2024 10:13:23 -0700
Subject: [PATCH] Fix mismerge

Created using spr 1.3.4
---
 lld/ELF/Arch/RISCV.cpp| 70 ---
 lld/test/ELF/riscv-attributes.s   |  4 +-
 llvm/include/llvm/Support/RISCVAttributes.h   |  4 +-
 .../MCTargetDesc/RISCVTargetStreamer.cpp  | 15 ++--
 4 files changed, 39 insertions(+), 54 deletions(-)

diff --git a/lld/ELF/Arch/RISCV.cpp b/lld/ELF/Arch/RISCV.cpp
index 1d9be92601006..0d0ab5de9acc1 100644
--- a/lld/ELF/Arch/RISCV.cpp
+++ b/lld/ELF/Arch/RISCV.cpp
@@ -1088,65 +1088,51 @@ static void mergeAtomic(DenseMap::iterator it,
 const InputSectionBase *oldSection,
 const InputSectionBase *newSection, unsigned int 
oldTag,
 unsigned int newTag) {
-  using RISCVAttrs::RISCVAtomicAbiTag;
+  using RISCVAttrs::RISCVAtomicAbiTag::AtomicABI;
   // Same tags stay the same, and UNKNOWN is compatible with anything
-  if (oldTag == newTag ||
-  newTag == static_cast(RISCVAtomicAbiTag::UNKNOWN))
+  if (oldTag == newTag || newTag == AtomicABI::UNKNOWN)
 return;
 
-  auto reportAbiError = [&]() {
-errorOrWarn("atomic abi mismatch for " + oldSection->name + "\n>>> " +
-toString(oldSection) + ": atomic_abi=" + Twine(oldTag) +
-"\n>>> " + toString(newSection) +
-": atomic_abi=" + Twine(newTag));
-  };
-
-  switch (static_cast(oldTag)) {
-  case RISCVAtomicAbiTag::UNKNOWN:
+  switch (oldTag) {
+  case AtomicABI::UNKNOWN:
 it->getSecond() = newTag;
 return;
-  case RISCVAtomicAbiTag::A6C:
-switch (static_cast(newTag)) {
-case RISCVAtomicAbiTag::A6S:
-  it->getSecond() = static_cast(RISCVAtomicAbiTag::A6C);
+  case AtomicABI::A6C:
+switch (newTag) {
+case AtomicABI::A6S:
+  it->getSecond() = AtomicABI::A6C;
   return;
-case RISCVAtomicAbiTag::A7:
-  reportAbiError();
+case AtomicABI::A7:
+  errorOrWarn(toString(oldSection) + " has atomic_abi=" + Twine(oldTag) +
+  " but " + toString(newSection) +
+  " has atomic_abi=" + Twine(newTag));
   return;
-case RISCVAttrs::RISCVAtomicAbiTag::UNKNOWN:
-case RISCVAttrs::RISCVAtomicAbiTag::A6C:
-  break;
 };
-break;
 
-  case RISCVAtomicAbiTag::A6S:
-switch (static_cast(newTag)) {
-case RISCVAtomicAbiTag::A6C:
-  it->getSecond() = static_cast(RISCVAtomicAbiTag::A6C);
+  case AtomicABI::A6S:
+switch (newTag) {
+case AtomicABI::A6C:
+  it->getSecond() = AtomicABI::A6C;
   return;
-case RISCVAtomicAbiTag::A7:
-  it->getSecond() = static_cast(RISCVAtomicAbiTag::A7);
+case AtomicABI::A7:
+  it->getSecond() = AtomicABI::A7;
   return;
-case RISCVAttrs::RISCVAtomicAbiTag::UNKNOWN:
-case RISCVAttrs::RISCVAtomicAbiTag::A6S:
-  break;
 };
-break;
 
-  case RISCVAtomicAbiTag::A7:
-switch (static_cast(newTag)) {
-case RISCVAtomicAbiTag::A6S:
-  it->getSecond() = static_cast(RISCVAtomicAbiTag::A7);
+  case AtomicABI::A7:
+switch (newTag) {
+case AtomicABI::A6S:
+  it->getSecond() = AtomicABI::A7;
   return;
-case RISCVAtomicAbiTag::A6C:
-  reportAbiError();
+case AtomicABI::A6C:
+  errorOrWarn(toString(oldSection) + " has atomic_abi=" + Twine(oldTag) +
+  " but " + toString(newSection) +
+  " has atomic_abi=" + Twine(newTag));
   return;
-case RISCVAttrs::RISCVAtomicAbiTag::UNKNOWN:
-case RISCVAttrs::RISCVAtomicAbiTag::A7:
-  break;
 };
+  default:
+llvm_unreachable("unknown AtomicABI");
   };
-  llvm_unreachable("unknown AtomicABI");
 }
 
 static RISCVAttributesSection *
diff --git a/lld/test/ELF/riscv-attributes.s b/lld/test/ELF/riscv-attributes.s
index 38b0fe8e7797e..91321f33297aa 100644
--- a/lld/test/ELF/riscv-attributes.s
+++ b/lld/test/ELF/riscv-attributes.s
@@ -48,9 +48,7 @@
 # RUN: llvm-mc -filetype=obj -triple=riscv64  atomic_abi_A6C.s -o 
atomic_abi_A6C.o
 # RUN: llvm-mc -filetype=obj -triple=riscv64  atomic_abi_A7.s -o 
atomic_abi_A7.o
 # RUN: not ld.lld atomic_abi_A6C.o atomic_abi_A7.o -o /dev/null 2>&1 | 
FileCheck %s --check-prefix=ATOMIC_ABI_ERROR --implicit-check-not=error:
-# ATOMIC_ABI_ERROR: error: atomic abi mismatch for .riscv.attributes
-# ATOMIC_ABI_ERROR-NEXT: >>> atomic_abi_A6C.o:(.riscv.attributes): atomic_abi=1
-# ATOMIC_ABI_ERROR-NEXT: >>> atomic_abi_A7.o:(.riscv.attributes): atomic_abi=3
+# ATOMIC_ABI_ERROR: error: atomic_abi_A6C.o:(.riscv.attributes) has 
atomic_abi=1 but atomic_abi_A7.o:(.riscv.attributes) has atomic_abi=3
 
 # RUN: llvm-mc -filetype=obj -triple=riscv64  atomic_abi_A6S.s -o 
atomic_abi_A6S.o
 # RUN: ld.lld atomic_abi_A6S.o atomic_abi_A6C.o -o atomic_abi_A6C_A6S
diff --git a/llvm/include/llvm/Sup

[llvm-branch-commits] [BOLT] Drop macro-fusion alignment (PR #97358)

2024-07-02 Thread Davide Italiano via llvm-branch-commits

https://github.com/dcci approved this pull request.

Thanks for measuring the impact and removing this code, Amir.

https://github.com/llvm/llvm-project/pull/97358
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [BOLT] Drop macro-fusion alignment (PR #97358)

2024-07-02 Thread Davide Italiano via llvm-branch-commits

dcci wrote:

Maybe you can add a comment to the title explaining *why* we're removing it.

https://github.com/llvm/llvm-project/pull/97358
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [BOLT] Drop macro-fusion alignment (PR #97358)

2024-07-02 Thread Amir Ayupov via llvm-branch-commits

https://github.com/aaupov edited https://github.com/llvm/llvm-project/pull/97358
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [BOLT] Drop macro-fusion alignment (PR #97358)

2024-07-02 Thread Amir Ayupov via llvm-branch-commits

https://github.com/aaupov closed https://github.com/llvm/llvm-project/pull/97358
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [BOLT] Drop macro-fusion alignment (PR #97358)

2024-07-02 Thread Davide Italiano via llvm-branch-commits

dcci wrote:

is this the correct destination branch?

https://github.com/llvm/llvm-project/pull/97358
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [BOLT] Drop macro-fusion alignment (PR #97358)

2024-07-02 Thread Amir Ayupov via llvm-branch-commits

aaupov wrote:

It's not. I landed to main manually.

https://github.com/llvm/llvm-project/pull/97358
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Match functions with calls as anchors (PR #96596)

2024-07-02 Thread Shaw Young via llvm-branch-commits

https://github.com/shawbyoung updated 
https://github.com/llvm/llvm-project/pull/96596

>From 05d59574d6260b98a469921eb2fccf5398bfafb6 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Mon, 24 Jun 2024 23:00:59 -0700
Subject: [PATCH 1/8] Added call to matchWithCallsAsAnchors

Created using spr 1.3.4
---
 bolt/lib/Profile/YAMLProfileReader.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp 
b/bolt/lib/Profile/YAMLProfileReader.cpp
index aafffac3d4b1c..1a0e5d239d252 100644
--- a/bolt/lib/Profile/YAMLProfileReader.cpp
+++ b/bolt/lib/Profile/YAMLProfileReader.cpp
@@ -479,6 +479,9 @@ Error YAMLProfileReader::readProfile(BinaryContext &BC) {
 if (!YamlBF.Used && BF && !ProfiledFunctions.count(BF))
   matchProfileToFunction(YamlBF, *BF);
 
+  uint64_t MatchedWithCallsAsAnchors = 0;
+  matchWithCallsAsAnchors(BC,  MatchedWithCallsAsAnchors);
+
   for (yaml::bolt::BinaryFunctionProfile &YamlBF : YamlBP.Functions)
 if (!YamlBF.Used && opts::Verbosity >= 1)
   errs() << "BOLT-WARNING: profile ignored for function " << YamlBF.Name

>From 77ef0008f4f5987719555e6cc3e32da812ae0f31 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Mon, 24 Jun 2024 23:11:43 -0700
Subject: [PATCH 2/8] Changed CallHashToBF representation

Created using spr 1.3.4
---
 bolt/lib/Profile/YAMLProfileReader.cpp | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp 
b/bolt/lib/Profile/YAMLProfileReader.cpp
index 1a0e5d239d252..91b01a99c7485 100644
--- a/bolt/lib/Profile/YAMLProfileReader.cpp
+++ b/bolt/lib/Profile/YAMLProfileReader.cpp
@@ -29,6 +29,10 @@ static llvm::cl::opt
cl::desc("ignore hash while reading function profile"),
cl::Hidden, cl::cat(BoltOptCategory));
 
+llvm::cl::opt MatchWithCallsAsAnchors("match-with-calls-as-anchors",
+  cl::desc("Matches with calls as anchors"),
+  cl::Hidden, cl::cat(BoltOptCategory));
+
 llvm::cl::opt ProfileUseDFS("profile-use-dfs",
   cl::desc("use DFS order for YAML profile"),
   cl::Hidden, cl::cat(BoltOptCategory));
@@ -353,7 +357,7 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
 llvm_unreachable("Unhandled HashFunction");
   };
 
-  std::unordered_map CallHashToBF;
+  std::unordered_map CallHashToBF;
 
   for (BinaryFunction *BF : BC.getAllBinaryFunctions()) {
 if (ProfiledFunctions.count(BF))
@@ -375,12 +379,12 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
   for (const std::string &FunctionName : FunctionNames)
 HashString.append(FunctionName);
 }
-CallHashToBF.emplace(ComputeCallHash(HashString), BF);
+CallHashToBF[ComputeCallHash(HashString)] = BF;
   }
 
   std::unordered_map ProfiledFunctionIdToName;
 
-  for (const yaml::bolt::BinaryFunctionProfile YamlBF : YamlBP.Functions)
+  for (const yaml::bolt::BinaryFunctionProfile &YamlBF : YamlBP.Functions)
 ProfiledFunctionIdToName[YamlBF.Id] = YamlBF.Name;
 
   for (yaml::bolt::BinaryFunctionProfile &YamlBF : YamlBP.Functions) {
@@ -401,7 +405,7 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
 auto It = CallHashToBF.find(Hash);
 if (It == CallHashToBF.end())
   continue;
-matchProfileToFunction(YamlBF, It->second);
+matchProfileToFunction(YamlBF, *It->second);
 ++MatchedWithCallsAsAnchors;
   }
 }
@@ -480,7 +484,8 @@ Error YAMLProfileReader::readProfile(BinaryContext &BC) {
   matchProfileToFunction(YamlBF, *BF);
 
   uint64_t MatchedWithCallsAsAnchors = 0;
-  matchWithCallsAsAnchors(BC,  MatchedWithCallsAsAnchors);
+  if (opts::MatchWithCallsAsAnchors)
+matchWithCallsAsAnchors(BC,  MatchedWithCallsAsAnchors);
 
   for (yaml::bolt::BinaryFunctionProfile &YamlBF : YamlBP.Functions)
 if (!YamlBF.Used && opts::Verbosity >= 1)

>From ea7cb68ab9e8e158412c2e752986968968a60d93 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Tue, 25 Jun 2024 09:28:39 -0700
Subject: [PATCH 3/8] Changed BF called FunctionNames to multiset

Created using spr 1.3.4
---
 bolt/lib/Profile/YAMLProfileReader.cpp | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp 
b/bolt/lib/Profile/YAMLProfileReader.cpp
index 91b01a99c7485..3b3d73f7af023 100644
--- a/bolt/lib/Profile/YAMLProfileReader.cpp
+++ b/bolt/lib/Profile/YAMLProfileReader.cpp
@@ -365,7 +365,7 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
 
 std::string HashString;
 for (const auto &BB : BF->blocks()) {
-  std::set FunctionNames;
+  std::multiset FunctionNames;
   for (const MCInst &Instr : BB) {
 // Skip non-call instructions.
 if (!BC.MIB->isCall(Instr))
@@ -397,9 +397,8 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
 std::string &FunctionName = ProfiledFunctionIdToName[CallSite.DestId];
 FunctionNames.insert(FunctionName);
   

[llvm-branch-commits] [llvm] [BOLT] Match functions with calls as anchors (PR #96596)

2024-07-02 Thread Shaw Young via llvm-branch-commits

https://github.com/shawbyoung updated 
https://github.com/llvm/llvm-project/pull/96596

>From 05d59574d6260b98a469921eb2fccf5398bfafb6 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Mon, 24 Jun 2024 23:00:59 -0700
Subject: [PATCH 1/9] Added call to matchWithCallsAsAnchors

Created using spr 1.3.4
---
 bolt/lib/Profile/YAMLProfileReader.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp 
b/bolt/lib/Profile/YAMLProfileReader.cpp
index aafffac3d4b1c..1a0e5d239d252 100644
--- a/bolt/lib/Profile/YAMLProfileReader.cpp
+++ b/bolt/lib/Profile/YAMLProfileReader.cpp
@@ -479,6 +479,9 @@ Error YAMLProfileReader::readProfile(BinaryContext &BC) {
 if (!YamlBF.Used && BF && !ProfiledFunctions.count(BF))
   matchProfileToFunction(YamlBF, *BF);
 
+  uint64_t MatchedWithCallsAsAnchors = 0;
+  matchWithCallsAsAnchors(BC,  MatchedWithCallsAsAnchors);
+
   for (yaml::bolt::BinaryFunctionProfile &YamlBF : YamlBP.Functions)
 if (!YamlBF.Used && opts::Verbosity >= 1)
   errs() << "BOLT-WARNING: profile ignored for function " << YamlBF.Name

>From 77ef0008f4f5987719555e6cc3e32da812ae0f31 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Mon, 24 Jun 2024 23:11:43 -0700
Subject: [PATCH 2/9] Changed CallHashToBF representation

Created using spr 1.3.4
---
 bolt/lib/Profile/YAMLProfileReader.cpp | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp 
b/bolt/lib/Profile/YAMLProfileReader.cpp
index 1a0e5d239d252..91b01a99c7485 100644
--- a/bolt/lib/Profile/YAMLProfileReader.cpp
+++ b/bolt/lib/Profile/YAMLProfileReader.cpp
@@ -29,6 +29,10 @@ static llvm::cl::opt
cl::desc("ignore hash while reading function profile"),
cl::Hidden, cl::cat(BoltOptCategory));
 
+llvm::cl::opt MatchWithCallsAsAnchors("match-with-calls-as-anchors",
+  cl::desc("Matches with calls as anchors"),
+  cl::Hidden, cl::cat(BoltOptCategory));
+
 llvm::cl::opt ProfileUseDFS("profile-use-dfs",
   cl::desc("use DFS order for YAML profile"),
   cl::Hidden, cl::cat(BoltOptCategory));
@@ -353,7 +357,7 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
 llvm_unreachable("Unhandled HashFunction");
   };
 
-  std::unordered_map CallHashToBF;
+  std::unordered_map CallHashToBF;
 
   for (BinaryFunction *BF : BC.getAllBinaryFunctions()) {
 if (ProfiledFunctions.count(BF))
@@ -375,12 +379,12 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
   for (const std::string &FunctionName : FunctionNames)
 HashString.append(FunctionName);
 }
-CallHashToBF.emplace(ComputeCallHash(HashString), BF);
+CallHashToBF[ComputeCallHash(HashString)] = BF;
   }
 
   std::unordered_map ProfiledFunctionIdToName;
 
-  for (const yaml::bolt::BinaryFunctionProfile YamlBF : YamlBP.Functions)
+  for (const yaml::bolt::BinaryFunctionProfile &YamlBF : YamlBP.Functions)
 ProfiledFunctionIdToName[YamlBF.Id] = YamlBF.Name;
 
   for (yaml::bolt::BinaryFunctionProfile &YamlBF : YamlBP.Functions) {
@@ -401,7 +405,7 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
 auto It = CallHashToBF.find(Hash);
 if (It == CallHashToBF.end())
   continue;
-matchProfileToFunction(YamlBF, It->second);
+matchProfileToFunction(YamlBF, *It->second);
 ++MatchedWithCallsAsAnchors;
   }
 }
@@ -480,7 +484,8 @@ Error YAMLProfileReader::readProfile(BinaryContext &BC) {
   matchProfileToFunction(YamlBF, *BF);
 
   uint64_t MatchedWithCallsAsAnchors = 0;
-  matchWithCallsAsAnchors(BC,  MatchedWithCallsAsAnchors);
+  if (opts::MatchWithCallsAsAnchors)
+matchWithCallsAsAnchors(BC,  MatchedWithCallsAsAnchors);
 
   for (yaml::bolt::BinaryFunctionProfile &YamlBF : YamlBP.Functions)
 if (!YamlBF.Used && opts::Verbosity >= 1)

>From ea7cb68ab9e8e158412c2e752986968968a60d93 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Tue, 25 Jun 2024 09:28:39 -0700
Subject: [PATCH 3/9] Changed BF called FunctionNames to multiset

Created using spr 1.3.4
---
 bolt/lib/Profile/YAMLProfileReader.cpp | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp 
b/bolt/lib/Profile/YAMLProfileReader.cpp
index 91b01a99c7485..3b3d73f7af023 100644
--- a/bolt/lib/Profile/YAMLProfileReader.cpp
+++ b/bolt/lib/Profile/YAMLProfileReader.cpp
@@ -365,7 +365,7 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
 
 std::string HashString;
 for (const auto &BB : BF->blocks()) {
-  std::set FunctionNames;
+  std::multiset FunctionNames;
   for (const MCInst &Instr : BB) {
 // Skip non-call instructions.
 if (!BC.MIB->isCall(Instr))
@@ -397,9 +397,8 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
 std::string &FunctionName = ProfiledFunctionIdToName[CallSite.DestId];
 FunctionNames.insert(FunctionName);
   

[llvm-branch-commits] [llvm] [BOLT] Match blocks with calls as anchors (PR #96596)

2024-07-02 Thread Shaw Young via llvm-branch-commits

https://github.com/shawbyoung edited 
https://github.com/llvm/llvm-project/pull/96596
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [Flang][OpenMP] Derived type explicit allocatable member mapping (PR #96266)

2024-07-02 Thread via llvm-branch-commits

agozillon wrote:

Small ping for some reviewer attention please, if at all possible on this PR 
stack! :-) thank you very much in advance! 

https://github.com/llvm/llvm-project/pull/96266
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] [RISCV][lld] Support merging RISC-V Atomics ABI attributes (PR #97347)

2024-07-02 Thread Paul Kirth via llvm-branch-commits

https://github.com/ilovepi updated 
https://github.com/llvm/llvm-project/pull/97347

>From 5c57d12517ad58310511eb27bf7cc4ec660d4fa7 Mon Sep 17 00:00:00 2001
From: Paul Kirth 
Date: Tue, 2 Jul 2024 09:39:46 -0700
Subject: [PATCH] Add handling for invalid tags

Created using spr 1.3.4
---
 lld/ELF/Arch/RISCV.cpp  | 22 --
 lld/test/ELF/riscv-attributes.s |  9 +
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/lld/ELF/Arch/RISCV.cpp b/lld/ELF/Arch/RISCV.cpp
index c48b9f7c11047..07a1b63be8051 100644
--- a/lld/ELF/Arch/RISCV.cpp
+++ b/lld/ELF/Arch/RISCV.cpp
@@ -1102,7 +1102,20 @@ static void mergeAtomic(DenseMap::iterator it,
 ": atomic_abi=" + Twine(static_cast(newTag)));
   };
 
-  switch (static_cast(oldTag)) {
+  auto reportUnknownAbiError = [](const InputSectionBase *section,
+  RISCVAtomicAbiTag tag) {
+switch (tag) {
+case RISCVAtomicAbiTag::UNKNOWN:
+case RISCVAtomicAbiTag::A6C:
+case RISCVAtomicAbiTag::A6S:
+case RISCVAtomicAbiTag::A7:
+  return;
+};
+errorOrWarn("unknown atomic abi for " + section->name + "\n>>> " +
+toString(section) +
+": atomic_abi=" + Twine(static_cast(tag)));
+  };
+  switch (oldTag) {
   case RISCVAtomicAbiTag::UNKNOWN:
 it->getSecond() = static_cast(newTag);
 return;
@@ -1145,7 +1158,12 @@ static void mergeAtomic(DenseMap::iterator it,
   return;
 };
   };
-  llvm_unreachable("unknown AtomicABI");
+
+  // If we get here, then we have an invalid tag, so report it.
+  // Putting these checks at the end allows us to only do these checks when we
+  // need to, since this is expected to be a rare occurrence.
+  reportUnknownAbiError(oldSection, oldTag);
+  reportUnknownAbiError(newSection, newTag);
 }
 
 static RISCVAttributesSection *
diff --git a/lld/test/ELF/riscv-attributes.s b/lld/test/ELF/riscv-attributes.s
index 38b0fe8e7797e..057223c18418e 100644
--- a/lld/test/ELF/riscv-attributes.s
+++ b/lld/test/ELF/riscv-attributes.s
@@ -52,6 +52,12 @@
 # ATOMIC_ABI_ERROR-NEXT: >>> atomic_abi_A6C.o:(.riscv.attributes): atomic_abi=1
 # ATOMIC_ABI_ERROR-NEXT: >>> atomic_abi_A7.o:(.riscv.attributes): atomic_abi=3
 
+## RISC-V tag merging for atomic_abi values A6C and invalid lead to an error.
+# RUN: llvm-mc -filetype=obj -triple=riscv64  atomic_abi_invalid.s -o 
atomic_abi_invalid.o
+# RUN: not ld.lld atomic_abi_A6C.o atomic_abi_invalid.o -o /dev/null 2>&1 | 
FileCheck %s --check-prefix=ATOMIC_ABI_INVALID --implicit-check-not=error:
+# ATOMIC_ABI_INVALID: error: unknown atomic abi for .riscv.attributes
+# ATOMIC_ABI_INVALID-NEXT: >>> atomic_abi_invalid.o:(.riscv.attributes): 
atomic_abi=42
+
 # RUN: llvm-mc -filetype=obj -triple=riscv64  atomic_abi_A6S.s -o 
atomic_abi_A6S.o
 # RUN: ld.lld atomic_abi_A6S.o atomic_abi_A6C.o -o atomic_abi_A6C_A6S
 # RUN: llvm-readobj -A atomic_abi_A6C_A6S | FileCheck %s --check-prefix=A6C_A6S
@@ -332,6 +338,9 @@
 #--- atomic_abi_A7.s
 .attribute atomic_abi, 3
 
+#--- atomic_abi_invalid.s
+.attribute atomic_abi, 42
+
 #  UNKNOWN_NONE: BuildAttributes {
 # UNKNOWN_NONE-NEXT:   FormatVersion: 0x41
 # UNKNOWN_NONE-NEXT:   Section 1 {

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Match blocks with calls as anchors (PR #96596)

2024-07-02 Thread Shaw Young via llvm-branch-commits

https://github.com/shawbyoung edited 
https://github.com/llvm/llvm-project/pull/96596
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Match blocks with calls as anchors (PR #96596)

2024-07-02 Thread Shaw Young via llvm-branch-commits

https://github.com/shawbyoung ready_for_review 
https://github.com/llvm/llvm-project/pull/96596
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add subtarget feature for global atomic fadd denormal support (PR #96443)

2024-07-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/96443

>From b5be20b30c144b22b752d438fad1e7e8c351864d Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Sun, 23 Jun 2024 16:44:08 +0200
Subject: [PATCH 1/3] AMDGPU: Add subtarget feature for global atomic fadd
 denormal support

Not sure what the behavior for gfx90a is. The SPG says it always flushes.
The instruction documentation says it does not.
---
 llvm/lib/Target/AMDGPU/AMDGPU.td  | 14 --
 llvm/lib/Target/AMDGPU/GCNSubtarget.h |  7 +++
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 3f35db8883716..51c077598df74 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -788,6 +788,13 @@ def FeatureFlatAtomicFaddF32Inst
   "Has flat_atomic_add_f32 instruction"
 >;
 
+def FeatureMemoryAtomicFaddF32DenormalSupport
+  : SubtargetFeature<"memory-atomic-fadd-f32-denormal-support",
+  "HasAtomicMemoryAtomicFaddF32DenormalSupport",
+  "true",
+  "global/flat/buffer atomic fadd for float supports denormal handling"
+>;
+
 def FeatureAgentScopeFineGrainedRemoteMemoryAtomics
   : SubtargetFeature<"agent-scope-fine-grained-remote-memory-atomics",
   "HasAgentScopeFineGrainedRemoteMemoryAtomics",
@@ -1427,7 +1434,8 @@ def FeatureISAVersion9_4_Common : FeatureSet<
FeatureKernargPreload,
FeatureAtomicFMinFMaxF64GlobalInsts,
FeatureAtomicFMinFMaxF64FlatInsts,
-   FeatureAgentScopeFineGrainedRemoteMemoryAtomics
+   FeatureAgentScopeFineGrainedRemoteMemoryAtomics,
+   FeatureMemoryAtomicFaddF32DenormalSupport
]>;
 
 def FeatureISAVersion9_4_0 : FeatureSet<
@@ -1631,7 +1639,9 @@ def FeatureISAVersion12 : FeatureSet<
FeatureScalarDwordx3Loads,
FeatureDPPSrc1SGPR,
FeatureMaxHardClauseLength32,
-   Feature1_5xVGPRs]>;
+   Feature1_5xVGPRs,
+   FeatureMemoryAtomicFaddF32DenormalSupport]>;
+   ]>;
 
 def FeatureISAVersion12_Generic: FeatureSet<
   !listconcat(FeatureISAVersion12.Features,
diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.h 
b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
index 9e2a316a9ed28..db0b2b67a0388 100644
--- a/llvm/lib/Target/AMDGPU/GCNSubtarget.h
+++ b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
@@ -167,6 +167,7 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo,
   bool HasAtomicFlatPkAdd16Insts = false;
   bool HasAtomicFaddRtnInsts = false;
   bool HasAtomicFaddNoRtnInsts = false;
+  bool HasAtomicMemoryAtomicFaddF32DenormalSupport = false;
   bool HasAtomicBufferGlobalPkAddF16NoRtnInsts = false;
   bool HasAtomicBufferGlobalPkAddF16Insts = false;
   bool HasAtomicCSubNoRtnInsts = false;
@@ -872,6 +873,12 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo,
 
   bool hasFlatAtomicFaddF32Inst() const { return HasFlatAtomicFaddF32Inst; }
 
+  /// \return true if the target's flat, global, and buffer atomic fadd for
+  /// float supports denormal handling.
+  bool hasMemoryAtomicFaddF32DenormalSupport() const {
+return HasAtomicMemoryAtomicFaddF32DenormalSupport;
+  }
+
   /// \return true if atomic operations targeting fine-grained memory work
   /// correctly at device scope, in allocations in host or peer PCIe device
   /// memory.

>From 17974fffac838a1120c7a78e404e9e4fd62e49be Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 24 Jun 2024 12:10:37 +0200
Subject: [PATCH 2/3] Add to gfx11.

RDNA 3 manual says "Floating-point addition handles NAN/INF/denorm"
thought I'm not sure I trust it.
---
 llvm/lib/Target/AMDGPU/AMDGPU.td | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 51c077598df74..370992eb81ff3 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -1547,7 +1547,8 @@ def FeatureISAVersion11_Common : FeatureSet<
FeatureFlatAtomicFaddF32Inst,
FeatureImageInsts,
FeaturePackedTID,
-   FeatureVcmpxPermlaneHazard]>;
+   FeatureVcmpxPermlaneHazard,
+   FeatureMemoryAtomicFaddF32DenormalSupport]>;
 
 // There are few workarounds that need to be
 // added to all targets. This pessimizes codegen
@@ -1640,7 +1641,7 @@ def FeatureISAVersion12 : FeatureSet<
FeatureDPPSrc1SGPR,
FeatureMaxHardClauseLength32,
Feature1_5xVGPRs,
-   FeatureMemoryAtomicFaddF32DenormalSupport]>;
+   FeatureMemoryAtomicFaddF32DenormalSupport
]>;
 
 def FeatureISAVersion12_Generic: FeatureSet<

>From 1a5d8b8dd40577164e3d0b80e64df2851d84c1c2 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 26 Jun 2024 11:30:51 +0200
Subject: [PATCH 3/3] Rename

---
 llvm/lib/Target/AMDGPU/AMDGPU.td  | 10 +-
 llvm/lib/Target/AMDGPU/GCNSubtarget.h |  4 ++--
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 370992eb81ff3..bea233bfb27bd 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -78

[llvm-branch-commits] [llvm] AMDGPU: Add subtarget feature for memory atomic fadd f64 (PR #96444)

2024-07-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/96444

>From ece323988a3a47cfa5a1332620100d77a9cbd56a Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Sun, 23 Jun 2024 17:07:53 +0200
Subject: [PATCH] AMDGPU: Add subtarget feature for memory atomic fadd f64

---
 llvm/lib/Target/AMDGPU/AMDGPU.td   | 21 ++---
 llvm/lib/Target/AMDGPU/BUFInstructions.td  | 10 ++
 llvm/lib/Target/AMDGPU/FLATInstructions.td |  6 +++---
 llvm/lib/Target/AMDGPU/GCNSubtarget.h  | 10 +++---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp  |  2 +-
 5 files changed, 31 insertions(+), 18 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index bea233bfb27bd..94e8e77b3c052 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -788,6 +788,13 @@ def FeatureFlatAtomicFaddF32Inst
   "Has flat_atomic_add_f32 instruction"
 >;
 
+def FeatureFlatBufferGlobalAtomicFaddF64Inst
+  : SubtargetFeature<"flat-buffer-global-fadd-f64-inst",
+  "HasFlatBufferGlobalAtomicFaddF64Inst",
+  "true",
+  "Has flat, buffer, and global instructions for f64 atomic fadd"
+>;
+
 def FeatureMemoryAtomicFAddF32DenormalSupport
   : SubtargetFeature<"memory-atomic-fadd-f32-denormal-support",
   "HasMemoryAtomicFaddF32DenormalSupport",
@@ -1390,7 +1397,8 @@ def FeatureISAVersion9_0_A : FeatureSet<
  FeatureBackOffBarrier,
  FeatureKernargPreload,
  FeatureAtomicFMinFMaxF64GlobalInsts,
- FeatureAtomicFMinFMaxF64FlatInsts
+ FeatureAtomicFMinFMaxF64FlatInsts,
+ FeatureFlatBufferGlobalAtomicFaddF64Inst
  ])>;
 
 def FeatureISAVersion9_0_C : FeatureSet<
@@ -1435,7 +1443,8 @@ def FeatureISAVersion9_4_Common : FeatureSet<
FeatureAtomicFMinFMaxF64GlobalInsts,
FeatureAtomicFMinFMaxF64FlatInsts,
FeatureAgentScopeFineGrainedRemoteMemoryAtomics,
-   FeatureMemoryAtomicFAddF32DenormalSupport
+   FeatureMemoryAtomicFAddF32DenormalSupport,
+   FeatureFlatBufferGlobalAtomicFaddF64Inst
]>;
 
 def FeatureISAVersion9_4_0 : FeatureSet<
@@ -1932,11 +1941,9 @@ def isGFX12Plus :
 def HasFlatAddressSpace : Predicate<"Subtarget->hasFlatAddressSpace()">,
   AssemblerPredicate<(all_of FeatureFlatAddressSpace)>;
 
-
-def HasBufferFlatGlobalAtomicsF64 : // FIXME: Rename to show it's only for fadd
-  Predicate<"Subtarget->hasBufferFlatGlobalAtomicsF64()">,
-  // FIXME: This is too coarse, and working around using pseudo's predicates 
on real instruction.
-  AssemblerPredicate<(any_of FeatureGFX90AInsts, FeatureGFX10Insts, 
FeatureSouthernIslands, FeatureSeaIslands)>;
+def HasFlatBufferGlobalAtomicFaddF64Inst :
+  Predicate<"Subtarget->hasFlatBufferGlobalAtomicFaddF64Inst()">,
+  AssemblerPredicate<(any_of FeatureFlatBufferGlobalAtomicFaddF64Inst)>;
 
 def HasAtomicFMinFMaxF32GlobalInsts :
   Predicate<"Subtarget->hasAtomicFMinFMaxF32GlobalInsts()">,
diff --git a/llvm/lib/Target/AMDGPU/BUFInstructions.td 
b/llvm/lib/Target/AMDGPU/BUFInstructions.td
index 3b8d94b744000..a904c8483dbf5 100644
--- a/llvm/lib/Target/AMDGPU/BUFInstructions.td
+++ b/llvm/lib/Target/AMDGPU/BUFInstructions.td
@@ -1312,14 +1312,16 @@ let SubtargetPredicate = isGFX90APlus in {
   }
 } // End SubtargetPredicate = isGFX90APlus
 
-let SubtargetPredicate = HasBufferFlatGlobalAtomicsF64 in {
+let SubtargetPredicate = HasFlatBufferGlobalAtomicFaddF64Inst in {
   defm BUFFER_ATOMIC_ADD_F64 : MUBUF_Pseudo_Atomics<"buffer_atomic_add_f64", 
VReg_64, f64>;
+} // End SubtargetPredicate = HasFlatBufferGlobalAtomicFaddF64Inst
 
+let SubtargetPredicate = HasAtomicFMinFMaxF64GlobalInsts in {
   // Note the names can be buffer_atomic_fmin_x2/buffer_atomic_fmax_x2
   // depending on some subtargets.
   defm BUFFER_ATOMIC_MIN_F64 : MUBUF_Pseudo_Atomics<"buffer_atomic_min_f64", 
VReg_64, f64>;
   defm BUFFER_ATOMIC_MAX_F64 : MUBUF_Pseudo_Atomics<"buffer_atomic_max_f64", 
VReg_64, f64>;
-} // End SubtargetPredicate = HasBufferFlatGlobalAtomicsF64
+}
 
 def BUFFER_INV : MUBUF_Invalidate<"buffer_inv"> {
   let SubtargetPredicate = isGFX940Plus;
@@ -1836,9 +1838,9 @@ let SubtargetPredicate = 
HasAtomicBufferGlobalPkAddF16Insts in {
   defm : SIBufferAtomicPat<"SIbuffer_atomic_fadd", v2f16, 
"BUFFER_ATOMIC_PK_ADD_F16", ["ret"]>;
 } // End SubtargetPredicate = HasAtomicBufferGlobalPkAddF16Insts
 
-let SubtargetPredicate = HasBufferFlatGlobalAtomicsF64 in {
+let SubtargetPredicate = HasFlatBufferGlobalAtomicFaddF64Inst in {
   defm : SIBufferAtomicPat<"SIbuffer_atomic_fadd", f64, 
"BUFFER_ATOMIC_ADD_F64">;
-} // End SubtargetPredicate = HasBufferFlatGlobalAtomicsF64
+} // End SubtargetPredicate = HasFlatBufferGlobalAtomicFaddF64Inst
 
 let SubtargetPredicate = HasAtomicFMinFMaxF64GlobalInsts in {
   defm : SIBufferAtomicPat<"SIbuffer_atomic_fmin", f64, 
"BUFFER_ATOMIC_MIN_F64">;
diff --git a/llvm/lib/Target/AMDGPU/FLATInstructions.td 
b/llvm/lib/Target/AMDGPU/FLATInstructions.td
index 4bf8f20269a15..16dc019ede810 100644
--- a/llvm/lib/Target/AMDGPU/FLATInstructions.td

[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw from {global|flat}_atomic_fadd_v2f16 builtins (PR #96873)

2024-07-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/96873

>From 926b7d46672ea5bc43494909c95ea54fd33ebf18 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 26 Jun 2024 19:12:59 +0200
Subject: [PATCH] clang/AMDGPU: Emit atomicrmw from
 {global|flat}_atomic_fadd_v2f16 builtins

---
 clang/lib/CodeGen/CGBuiltin.cpp   | 20 ++-
 .../builtins-fp-atomics-gfx12.cl  |  9 ++---
 .../builtins-fp-atomics-gfx90a.cl |  2 +-
 .../builtins-fp-atomics-gfx940.cl |  3 ++-
 4 files changed, 15 insertions(+), 19 deletions(-)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 39b20f750630c..c423c39dfca97 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -18641,22 +18641,15 @@ Value 
*CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
 Function *F = CGM.getIntrinsic(Intrin, { Src0->getType() });
 return Builder.CreateCall(F, { Src0, Builder.getFalse() });
   }
-  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64:
   case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64:
   case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64:
   case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16: {
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32: {
 Intrinsic::ID IID;
 llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext());
 switch (BuiltinID) {
-case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
-  ArgTy = llvm::FixedVectorType::get(
-  llvm::Type::getHalfTy(getLLVMContext()), 2);
-  IID = Intrinsic::amdgcn_global_atomic_fadd;
-  break;
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
   IID = Intrinsic::amdgcn_global_atomic_fmin;
   break;
@@ -18676,11 +18669,6 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   ArgTy = llvm::Type::getFloatTy(getLLVMContext());
   IID = Intrinsic::amdgcn_flat_atomic_fadd;
   break;
-case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16:
-  ArgTy = llvm::FixedVectorType::get(
-  llvm::Type::getHalfTy(getLLVMContext()), 2);
-  IID = Intrinsic::amdgcn_flat_atomic_fadd;
-  break;
 }
 llvm::Value *Addr = EmitScalarExpr(E->getArg(0));
 llvm::Value *Val = EmitScalarExpr(E->getArg(1));
@@ -19073,7 +19061,9 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   case AMDGPU::BI__builtin_amdgcn_ds_fminf:
   case AMDGPU::BI__builtin_amdgcn_ds_fmaxf:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
-  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: {
+  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64:
+  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16: {
 llvm::AtomicRMWInst::BinOp BinOp;
 switch (BuiltinID) {
 case AMDGPU::BI__builtin_amdgcn_atomic_inc32:
@@ -19091,6 +19081,8 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2bf16:
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64:
+case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
+case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16:
   BinOp = llvm::AtomicRMWInst::FAdd;
   break;
 case AMDGPU::BI__builtin_amdgcn_ds_fminf:
diff --git a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl 
b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl
index 6b8a6d14575db..07e63a8711c7f 100644
--- a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl
+++ b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl
@@ -48,7 +48,8 @@ void test_local_add_2f16_noret(__local half2 *addr, half2 x) {
 }
 
 // CHECK-LABEL: test_flat_add_2f16
-// CHECK: call <2 x half> @llvm.amdgcn.flat.atomic.fadd.v2f16.p0.v2f16(ptr 
%{{.*}}, <2 x half> %{{.*}})
+// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr %{{.+}}, <2 x half> %{{.+}} 
syncscope("agent") seq_cst, align 4, !amdgpu.no.fine.grained.memory !{{[0-9]+$}}
+
 // GFX12-LABEL:  test_flat_add_2f16
 // GFX12: flat_atomic_pk_add_f16
 half2 test_flat_add_2f16(__generic half2 *addr, half2 x) {
@@ -64,7 +65,8 @@ short2 test_flat_add_2bf16(__generic short2 *addr, short2 x) {
 }
 
 // CHECK-LABEL: test_global_add_half2
-// CHECK: call <2 x half> @llvm.amdgcn.global.atomic.fadd.v2f16.p1.v2f16(ptr 
addrspace(1) %{{.*}}, <2 x half> %{{.*}})
+// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr addrspace(1) %{{.+}}, <2 x half> 
%{{.+}} syncscope("agent") seq_cst, align 4, !amdgpu.no.fine.grained.memory 
!{{[0-9]+$}}
+
 // GFX12-LABEL:  test_global_add_half2
 // GFX12:  global_atomic_pk_add_f16 v2, v[0:1], v2, off

[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw for __builtin_amdgcn_global_atomic_fadd_{f32|f64} (PR #96872)

2024-07-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/96872

>From 3c66afc298a4dfbd4dfcb5e978c12b94f29cb458 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Tue, 11 Jun 2024 10:58:44 +0200
Subject: [PATCH 1/2] clang/AMDGPU: Emit atomicrmw for
 __builtin_amdgcn_global_atomic_fadd_{f32|f64}

Need to emit syncscope and new metadata to get the native instruction,
most of the time.
---
 clang/lib/CodeGen/CGBuiltin.cpp   | 39 +--
 .../CodeGenOpenCL/builtins-amdgcn-gfx11.cl|  2 +-
 .../builtins-fp-atomics-gfx12.cl  |  4 +-
 .../builtins-fp-atomics-gfx90a.cl |  4 +-
 .../builtins-fp-atomics-gfx940.cl |  4 +-
 5 files changed, 34 insertions(+), 19 deletions(-)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index ed37267efe715..d6ab6b25e987e 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -58,6 +58,7 @@
 #include "llvm/IR/MDBuilder.h"
 #include "llvm/IR/MatrixBuilder.h"
 #include "llvm/IR/MemoryModelRelaxationAnnotations.h"
+#include "llvm/Support/AMDGPUAddrSpace.h"
 #include "llvm/Support/ConvertUTF.h"
 #include "llvm/Support/MathExtras.h"
 #include "llvm/Support/ScopedPrinter.h"
@@ -18640,8 +18641,6 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 Function *F = CGM.getIntrinsic(Intrin, { Src0->getType() });
 return Builder.CreateCall(F, { Src0, Builder.getFalse() });
   }
-  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64:
-  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64:
@@ -18653,18 +18652,11 @@ Value 
*CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
 Intrinsic::ID IID;
 llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext());
 switch (BuiltinID) {
-case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
-  ArgTy = llvm::Type::getFloatTy(getLLVMContext());
-  IID = Intrinsic::amdgcn_global_atomic_fadd;
-  break;
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
   ArgTy = llvm::FixedVectorType::get(
   llvm::Type::getHalfTy(getLLVMContext()), 2);
   IID = Intrinsic::amdgcn_global_atomic_fadd;
   break;
-case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64:
-  IID = Intrinsic::amdgcn_global_atomic_fadd;
-  break;
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
   IID = Intrinsic::amdgcn_global_atomic_fmin;
   break;
@@ -19079,7 +19071,9 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2bf16:
   case AMDGPU::BI__builtin_amdgcn_ds_faddf:
   case AMDGPU::BI__builtin_amdgcn_ds_fminf:
-  case AMDGPU::BI__builtin_amdgcn_ds_fmaxf: {
+  case AMDGPU::BI__builtin_amdgcn_ds_fmaxf:
+  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
+  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: {
 llvm::AtomicRMWInst::BinOp BinOp;
 switch (BuiltinID) {
 case AMDGPU::BI__builtin_amdgcn_atomic_inc32:
@@ -19095,6 +19089,8 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_f32:
 case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2f16:
 case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2bf16:
+case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
+case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64:
   BinOp = llvm::AtomicRMWInst::FAdd;
   break;
 case AMDGPU::BI__builtin_amdgcn_ds_fminf:
@@ -19129,8 +19125,13 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   ProcessOrderScopeAMDGCN(EmitScalarExpr(E->getArg(2)),
   EmitScalarExpr(E->getArg(3)), AO, SSID);
 } else {
-  // The ds_atomic_fadd_* builtins do not have syncscope/order arguments.
-  SSID = llvm::SyncScope::System;
+  // Most of the builtins do not have syncscope/order arguments. For DS
+  // atomics the scope doesn't really matter, as they implicitly operate at
+  // workgroup scope.
+  //
+  // The global/flat cases need to use agent scope to consistently produce
+  // the native instruction instead of a cmpxchg expansion.
+  SSID = getLLVMContext().getOrInsertSyncScopeID("agent");
   AO = AtomicOrdering::SequentiallyConsistent;
 
   // The v2bf16 builtin uses i16 instead of a natural bfloat type.
@@ -19145,6 +19146,20 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 Builder.CreateAtomicRMW(BinOp, Ptr, Val, AO, SSID);
 if (Volatile)
   RMW->setVolatile(true);
+
+unsigned AddrSpace = Ptr.getType()->getAddressSpace();
+if (AddrSpace != llvm::AMDGPUAS::LOCAL_ADDRESS) {
+  // Most targets require "amdgpu.no.fine.grained.memory" to emit the 
nativ

[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw from flat_atomic_{f32|f64} builtins (PR #96874)

2024-07-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/96874

>From 3a0a8f445bb9cd82a4537689b47a61f4a85cd9e7 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 26 Jun 2024 19:15:26 +0200
Subject: [PATCH] clang/AMDGPU: Emit atomicrmw from flat_atomic_{f32|f64}
 builtins

---
 clang/lib/CodeGen/CGBuiltin.cpp | 17 ++---
 .../CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl |  6 --
 .../CodeGenOpenCL/builtins-fp-atomics-gfx940.cl |  3 ++-
 3 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index c423c39dfca97..54591bed6533b 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -18643,10 +18643,8 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   }
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64:
   case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32: {
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: {
 Intrinsic::ID IID;
 llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext());
 switch (BuiltinID) {
@@ -18656,19 +18654,12 @@ Value 
*CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64:
   IID = Intrinsic::amdgcn_global_atomic_fmax;
   break;
-case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64:
-  IID = Intrinsic::amdgcn_flat_atomic_fadd;
-  break;
 case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64:
   IID = Intrinsic::amdgcn_flat_atomic_fmin;
   break;
 case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64:
   IID = Intrinsic::amdgcn_flat_atomic_fmax;
   break;
-case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32:
-  ArgTy = llvm::Type::getFloatTy(getLLVMContext());
-  IID = Intrinsic::amdgcn_flat_atomic_fadd;
-  break;
 }
 llvm::Value *Addr = EmitScalarExpr(E->getArg(0));
 llvm::Value *Val = EmitScalarExpr(E->getArg(1));
@@ -19063,7 +19054,9 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16: {
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16:
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32:
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64: {
 llvm::AtomicRMWInst::BinOp BinOp;
 switch (BuiltinID) {
 case AMDGPU::BI__builtin_amdgcn_atomic_inc32:
@@ -19083,6 +19076,8 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64:
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
 case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16:
+case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32:
+case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64:
   BinOp = llvm::AtomicRMWInst::FAdd;
   break;
 case AMDGPU::BI__builtin_amdgcn_ds_fminf:
diff --git a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl 
b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl
index cd10777dbe079..02e289427238f 100644
--- a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl
+++ b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl
@@ -45,7 +45,8 @@ void test_global_max_f64(__global double *addr, double x){
 }
 
 // CHECK-LABEL: test_flat_add_local_f64
-// CHECK: call double @llvm.amdgcn.flat.atomic.fadd.f64.p3.f64(ptr 
addrspace(3) %{{.*}}, double %{{.*}})
+// CHECK: = atomicrmw fadd ptr addrspace(3) %{{.+}}, double %{{.+}} 
syncscope("agent") seq_cst, align 8{{$}}
+
 // GFX90A-LABEL:  test_flat_add_local_f64$local
 // GFX90A:  ds_add_rtn_f64
 void test_flat_add_local_f64(__local double *addr, double x){
@@ -54,7 +55,8 @@ void test_flat_add_local_f64(__local double *addr, double x){
 }
 
 // CHECK-LABEL: test_flat_global_add_f64
-// CHECK: call double @llvm.amdgcn.flat.atomic.fadd.f64.p1.f64(ptr 
addrspace(1) %{{.*}}, double %{{.*}})
+// CHECK: = atomicrmw fadd ptr addrspace(1) {{.+}}, double %{{.+}} 
syncscope("agent") seq_cst, align 8, !amdgpu.no.fine.grained.memory !{{[0-9]+$}}
+
 // GFX90A-LABEL:  test_flat_global_add_f64$local
 // GFX90A:  global_atomic_add_f64
 void test_flat_global_add_f64(__global double *addr, double x){
diff --git a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx940.cl 
b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx940.cl
index 589dcd406630d..bd9b8c7268e06 100644
--- a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx940.cl
+++ b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx940.cl
@@ -10,7 +10,8 @@ typedef half  _

[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw for global/flat fadd v2bf16 builtins (PR #96875)

2024-07-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/96875

>From e7a4d008fc1f0473a61fa77082374d968a5aec55 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 26 Jun 2024 19:34:43 +0200
Subject: [PATCH] clang/AMDGPU: Emit atomicrmw for global/flat fadd v2bf16
 builtins

---
 clang/lib/CodeGen/CGBuiltin.cpp   | 26 ++-
 .../builtins-fp-atomics-gfx12.cl  | 24 -
 .../builtins-fp-atomics-gfx90a.cl |  6 ++---
 .../builtins-fp-atomics-gfx940.cl | 14 +++---
 4 files changed, 38 insertions(+), 32 deletions(-)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 54591bed6533b..1b5ac57fc0a82 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -18667,22 +18667,6 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 CGM.getIntrinsic(IID, {ArgTy, Addr->getType(), Val->getType()});
 return Builder.CreateCall(F, {Addr, Val});
   }
-  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2bf16:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2bf16: {
-Intrinsic::ID IID;
-switch (BuiltinID) {
-case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2bf16:
-  IID = Intrinsic::amdgcn_global_atomic_fadd_v2bf16;
-  break;
-case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2bf16:
-  IID = Intrinsic::amdgcn_flat_atomic_fadd_v2bf16;
-  break;
-}
-llvm::Value *Addr = EmitScalarExpr(E->getArg(0));
-llvm::Value *Val = EmitScalarExpr(E->getArg(1));
-llvm::Function *F = CGM.getIntrinsic(IID, {Addr->getType()});
-return Builder.CreateCall(F, {Addr, Val});
-  }
   case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64_i32:
   case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64_v2i32:
   case AMDGPU::BI__builtin_amdgcn_global_load_tr_b128_v4i16:
@@ -19056,7 +19040,9 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
   case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16:
   case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64: {
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64:
+  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2bf16:
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2bf16: {
 llvm::AtomicRMWInst::BinOp BinOp;
 switch (BuiltinID) {
 case AMDGPU::BI__builtin_amdgcn_atomic_inc32:
@@ -19078,6 +19064,8 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16:
 case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32:
 case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64:
+case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2bf16:
+case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2bf16:
   BinOp = llvm::AtomicRMWInst::FAdd;
   break;
 case AMDGPU::BI__builtin_amdgcn_ds_fminf:
@@ -19122,7 +19110,9 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   AO = AtomicOrdering::Monotonic;
 
   // The v2bf16 builtin uses i16 instead of a natural bfloat type.
-  if (BuiltinID == AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2bf16) {
+  if (BuiltinID == AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2bf16 ||
+  BuiltinID == AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2bf16 ||
+  BuiltinID == AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2bf16) {
 llvm::Type *V2BF16Ty = FixedVectorType::get(
 llvm::Type::getBFloatTy(Builder.getContext()), 2);
 Val = Builder.CreateBitCast(Val, V2BF16Ty);
diff --git a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl 
b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl
index 07e63a8711c7f..e8b6eb57c38d7 100644
--- a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl
+++ b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl
@@ -11,7 +11,7 @@ typedef short __attribute__((ext_vector_type(2))) short2;
 
 // CHECK-LABEL: test_local_add_2bf16
 // CHECK: [[BC0:%.+]] = bitcast <2 x i16> {{.+}} to <2 x bfloat>
-// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr addrspace(3) %{{.+}}, <2 x bfloat> 
[[BC0]] syncscope("agent") monotonic, align 4
+// CHECK-NEXT: [[RMW:%.+]] = atomicrmw fadd ptr addrspace(3) %{{.+}}, <2 x 
bfloat> [[BC0]] syncscope("agent") monotonic, align 4
 // CHECK-NEXT: bitcast <2 x bfloat> [[RMW]] to <2 x i16>
 
 // GFX12-LABEL:  test_local_add_2bf16
@@ -48,7 +48,7 @@ void test_local_add_2f16_noret(__local half2 *addr, half2 x) {
 }
 
 // CHECK-LABEL: test_flat_add_2f16
-// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr %{{.+}}, <2 x half> %{{.+}} 
syncscope("agent") seq_cst, align 4, !amdgpu.no.fine.grained.memory !{{[0-9]+$}}
+// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr %{{.+}}, <2 x half> %{{.+}} 
syncscope("agent") monotonic, align 4, !amdgpu.no.fine.grained.memory 
!{{[0-9]+$}}
 
 // GFX12-LABEL:  test_flat_add_2f

[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw for flat/global atomic min/max f64 builtins (PR #96876)

2024-07-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/96876

>From 953322d39c5e049ba3305457942bcdf2524ad819 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 26 Jun 2024 23:18:32 +0200
Subject: [PATCH] clang/AMDGPU: Emit atomicrmw for flat/global atomic min/max
 f64 builtins

---
 clang/lib/CodeGen/CGBuiltin.cpp   | 36 +--
 .../builtins-fp-atomics-gfx90a.cl | 18 ++
 2 files changed, 21 insertions(+), 33 deletions(-)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 1b5ac57fc0a82..b000d2d7a906a 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -18641,32 +18641,6 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 Function *F = CGM.getIntrinsic(Intrin, { Src0->getType() });
 return Builder.CreateCall(F, { Src0, Builder.getFalse() });
   }
-  case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
-  case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: {
-Intrinsic::ID IID;
-llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext());
-switch (BuiltinID) {
-case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
-  IID = Intrinsic::amdgcn_global_atomic_fmin;
-  break;
-case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64:
-  IID = Intrinsic::amdgcn_global_atomic_fmax;
-  break;
-case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64:
-  IID = Intrinsic::amdgcn_flat_atomic_fmin;
-  break;
-case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64:
-  IID = Intrinsic::amdgcn_flat_atomic_fmax;
-  break;
-}
-llvm::Value *Addr = EmitScalarExpr(E->getArg(0));
-llvm::Value *Val = EmitScalarExpr(E->getArg(1));
-llvm::Function *F =
-CGM.getIntrinsic(IID, {ArgTy, Addr->getType(), Val->getType()});
-return Builder.CreateCall(F, {Addr, Val});
-  }
   case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64_i32:
   case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64_v2i32:
   case AMDGPU::BI__builtin_amdgcn_global_load_tr_b128_v4i16:
@@ -19042,7 +19016,11 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32:
   case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2bf16:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2bf16: {
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2bf16:
+  case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
+  case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64:
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64:
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: {
 llvm::AtomicRMWInst::BinOp BinOp;
 switch (BuiltinID) {
 case AMDGPU::BI__builtin_amdgcn_atomic_inc32:
@@ -19069,8 +19047,12 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   BinOp = llvm::AtomicRMWInst::FAdd;
   break;
 case AMDGPU::BI__builtin_amdgcn_ds_fminf:
+case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
+case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64:
   BinOp = llvm::AtomicRMWInst::FMin;
   break;
+case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64:
+case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64:
 case AMDGPU::BI__builtin_amdgcn_ds_fmaxf:
   BinOp = llvm::AtomicRMWInst::FMax;
   break;
diff --git a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl 
b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl
index 9381ce951df3e..556e553903d1a 100644
--- a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl
+++ b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl
@@ -27,7 +27,8 @@ void test_global_add_half2(__global half2 *addr, half2 x) {
 }
 
 // CHECK-LABEL: test_global_global_min_f64
-// CHECK: call double @llvm.amdgcn.global.atomic.fmin.f64.p1.f64(ptr 
addrspace(1) %{{.*}}, double %{{.*}})
+// CHECK: = atomicrmw fmin ptr addrspace(1) {{.+}}, double %{{.+}} 
syncscope("agent") monotonic, align 8, !amdgpu.no.fine.grained.memory 
!{{[0-9]+$}}
+
 // GFX90A-LABEL:  test_global_global_min_f64$local
 // GFX90A:  global_atomic_min_f64
 void test_global_global_min_f64(__global double *addr, double x){
@@ -36,7 +37,8 @@ void test_global_global_min_f64(__global double *addr, double 
x){
 }
 
 // CHECK-LABEL: test_global_max_f64
-// CHECK: call double @llvm.amdgcn.global.atomic.fmax.f64.p1.f64(ptr 
addrspace(1) %{{.*}}, double %{{.*}})
+// CHECK: = atomicrmw fmax ptr addrspace(1) {{.+}}, double %{{.+}} 
syncscope("agent") monotonic, align 8, !amdgpu.no.fine.grained.memory 
!{{[0-9]+$}}
+
 // GFX90A-LABEL:  test_global_max_f64$local
 // GFX90A:  global_atomic_max_f64
 void test_global_max_f64(__global double *addr, double x){
@@ -65,7 +67,8 @@ void test_flat_global_add_f64(__global double *addr, doub

[llvm-branch-commits] [llvm] AMDGPU: Remove flat/global atomic fadd v2bf16 intrinsics (PR #97050)

2024-07-02 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/97050

>From 5fe43ecedbf2411d4ba6ff251a8a33d72900f264 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Thu, 27 Jun 2024 16:32:48 +0200
Subject: [PATCH] AMDGPU: Remove flat/global atomic fadd v2bf16 intrinsics

These are now fully covered by atomicrmw.
---
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |   4 -
 llvm/lib/IR/AutoUpgrade.cpp   |  14 +-
 llvm/lib/Target/AMDGPU/AMDGPUInstructions.td  |   2 -
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |   2 -
 .../Target/AMDGPU/AMDGPUSearchableTables.td   |   2 -
 llvm/lib/Target/AMDGPU/FLATInstructions.td|   2 -
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |   6 +-
 llvm/test/Bitcode/amdgcn-atomic.ll|  22 ++
 .../AMDGPU/GlobalISel/fp-atomics-gfx940.ll|  54 -
 .../test/CodeGen/AMDGPU/fp-atomics-gfx1200.ll | 100 -
 llvm/test/CodeGen/AMDGPU/fp-atomics-gfx940.ll | 193 --
 11 files changed, 33 insertions(+), 368 deletions(-)

diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td 
b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
index 71b1e832bde3c..9cf4d6352d23d 100644
--- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
@@ -2907,10 +2907,6 @@ multiclass AMDGPUMFp8SmfmacIntrinsic {
 def NAME#"_"#kind : AMDGPUMFp8SmfmacIntrinsic;
 }
 
-// bf16 atomics use v2i16 argument since there is no bf16 data type in the 
llvm.
-def int_amdgcn_global_atomic_fadd_v2bf16 : AMDGPUAtomicRtn;
-def int_amdgcn_flat_atomic_fadd_v2bf16   : AMDGPUAtomicRtn;
-
 defset list AMDGPUMFMAIntrinsics940 = {
 def int_amdgcn_mfma_i32_16x16x32_i8 : AMDGPUMfmaIntrinsic;
 def int_amdgcn_mfma_i32_32x32x16_i8 : AMDGPUMfmaIntrinsic;
diff --git a/llvm/lib/IR/AutoUpgrade.cpp b/llvm/lib/IR/AutoUpgrade.cpp
index 5beefaa1ec701..8faaff5636665 100644
--- a/llvm/lib/IR/AutoUpgrade.cpp
+++ b/llvm/lib/IR/AutoUpgrade.cpp
@@ -1034,7 +1034,9 @@ static bool upgradeIntrinsicFunction1(Function *F, 
Function *&NewFn,
   }
 
   if (Name.starts_with("ds.fadd") || Name.starts_with("ds.fmin") ||
-  Name.starts_with("ds.fmax")) {
+  Name.starts_with("ds.fmax") ||
+  Name.starts_with("global.atomic.fadd.v2bf16") ||
+  Name.starts_with("flat.atomic.fadd.v2bf16")) {
 // Replaced with atomicrmw fadd/fmin/fmax, so there's no new
 // declaration.
 NewFn = nullptr;
@@ -2352,7 +2354,9 @@ static Value *upgradeAMDGCNIntrinsicCall(StringRef Name, 
CallBase *CI,
   .StartsWith("ds.fmin", AtomicRMWInst::FMin)
   .StartsWith("ds.fmax", AtomicRMWInst::FMax)
   .StartsWith("atomic.inc.", AtomicRMWInst::UIncWrap)
-  .StartsWith("atomic.dec.", AtomicRMWInst::UDecWrap);
+  .StartsWith("atomic.dec.", AtomicRMWInst::UDecWrap)
+  .StartsWith("global.atomic.fadd", AtomicRMWInst::FAdd)
+  .StartsWith("flat.atomic.fadd", AtomicRMWInst::FAdd);
 
   unsigned NumOperands = CI->getNumOperands();
   if (NumOperands < 3) // Malformed bitcode.
@@ -2407,8 +2411,10 @@ static Value *upgradeAMDGCNIntrinsicCall(StringRef Name, 
CallBase *CI,
   Builder.CreateAtomicRMW(RMWOp, Ptr, Val, std::nullopt, Order, SSID);
 
   if (PtrTy->getAddressSpace() != 3) {
-RMW->setMetadata("amdgpu.no.fine.grained.memory",
- MDNode::get(F->getContext(), {}));
+MDNode *EmptyMD = MDNode::get(F->getContext(), {});
+RMW->setMetadata("amdgpu.no.fine.grained.memory", EmptyMD);
+if (RMWOp == AtomicRMWInst::FAdd && RetTy->isFloatTy())
+  RMW->setMetadata("amdgpu.ignore.denormal.mode", EmptyMD);
   }
 
   if (IsVolatile)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td 
b/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
index c6dbc58395e48..db8b44149cf47 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
@@ -620,12 +620,10 @@ multiclass local_addr_space_atomic_op {
 
 defm int_amdgcn_flat_atomic_fadd : noret_op;
 defm int_amdgcn_flat_atomic_fadd : flat_addr_space_atomic_op;
-defm int_amdgcn_flat_atomic_fadd_v2bf16 : noret_op;
 defm int_amdgcn_flat_atomic_fmin : noret_op;
 defm int_amdgcn_flat_atomic_fmax : noret_op;
 defm int_amdgcn_global_atomic_fadd : global_addr_space_atomic_op;
 defm int_amdgcn_flat_atomic_fadd : global_addr_space_atomic_op;
-defm int_amdgcn_global_atomic_fadd_v2bf16 : noret_op;
 defm int_amdgcn_global_atomic_fmin : noret_op;
 defm int_amdgcn_global_atomic_fmax : noret_op;
 defm int_amdgcn_global_atomic_csub : noret_op;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index 9e7694f41d6b8..74686cd10512c 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
@@ -4897,8 +4897,6 @@ AMDGPURegisterBankInfo::getInstrMapping(const 
MachineInstr &MI) const {
 case Intrinsic::amdgcn_flat_atomic_fmax:
 case Intrinsic::amdgcn_fl

[llvm-branch-commits] [flang] [Flang][OpenMP] Derived type explicit allocatable member mapping (PR #96266)

2024-07-02 Thread Abid Qadeer via llvm-branch-commits


@@ -216,31 +215,50 @@ bool 
ClauseProcessor::processMotionClauses(lower::StatementContext &stmtCtx,
   if (origSymbol && fir::isTypeWithDescriptor(origSymbol.getType()))
 symAddr = origSymbol;
 
+  if (object.sym()->owner().IsDerivedType()) {
+omp::ObjectList objectList = gatherObjects(object, semaCtx);
+parentObj = objectList[0];
+parentMemberIndices.insert({parentObj.value(), {}});
+if (Fortran::semantics::IsAllocatableOrObjectPointer(
+object.sym()) ||
+memberHasAllocatableParent(object, semaCtx)) {
+  llvm::SmallVector indices =
+  generateMemberPlacementIndices(object, semaCtx);
+  symAddr = createParentSymAndGenIntermediateMaps(
+  clauseLocation, converter, objectList, indices,
+  parentMemberIndices[parentObj.value()], asFortran.str(),
+  mapTypeBits);
+}
+  }
+

abidh wrote:

There is almost same code added in ClauseProcessor.cpp above. Just wanted to 
point out in case this was not intentional.

https://github.com/llvm/llvm-project/pull/96266
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [clang][test] add testing for the AST matcher reference (PR #94248)

2024-07-02 Thread Aaron Ballman via llvm-branch-commits

AaronBallman wrote:

> Thanks.
> 
> > ... how this impacts build times for Clang itself? I'm assuming that if 
> > ASTMatchers.h isn't modified, CMake won't re-run 
> > generate_ast_matcher_doc_tests.py and so the compile time performance hit 
> > is only on full rebuilds or when changing the header?
> 
> The 'state' of the generated file is only checked when the `ASTMatchersTests` 
> target is built, because it's the only thing that depends on the generated 
> file. And the file is only generated when: the file does not exist or 
> `ASTMatchers.h` has changed (excluding transitive changes in includes) or 
> `generate_ast_matcher_doc_tests.py` has changed.

Excellent, thank you for the confirmation! That sounds reasonable to me.

https://github.com/llvm/llvm-project/pull/94248
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Match blocks with calls as anchors (PR #96596)

2024-07-02 Thread Shaw Young via llvm-branch-commits

https://github.com/shawbyoung updated 
https://github.com/llvm/llvm-project/pull/96596

>From 05d59574d6260b98a469921eb2fccf5398bfafb6 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Mon, 24 Jun 2024 23:00:59 -0700
Subject: [PATCH 1/9] Added call to matchWithCallsAsAnchors

Created using spr 1.3.4
---
 bolt/lib/Profile/YAMLProfileReader.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp 
b/bolt/lib/Profile/YAMLProfileReader.cpp
index aafffac3d4b1c..1a0e5d239d252 100644
--- a/bolt/lib/Profile/YAMLProfileReader.cpp
+++ b/bolt/lib/Profile/YAMLProfileReader.cpp
@@ -479,6 +479,9 @@ Error YAMLProfileReader::readProfile(BinaryContext &BC) {
 if (!YamlBF.Used && BF && !ProfiledFunctions.count(BF))
   matchProfileToFunction(YamlBF, *BF);
 
+  uint64_t MatchedWithCallsAsAnchors = 0;
+  matchWithCallsAsAnchors(BC,  MatchedWithCallsAsAnchors);
+
   for (yaml::bolt::BinaryFunctionProfile &YamlBF : YamlBP.Functions)
 if (!YamlBF.Used && opts::Verbosity >= 1)
   errs() << "BOLT-WARNING: profile ignored for function " << YamlBF.Name

>From 77ef0008f4f5987719555e6cc3e32da812ae0f31 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Mon, 24 Jun 2024 23:11:43 -0700
Subject: [PATCH 2/9] Changed CallHashToBF representation

Created using spr 1.3.4
---
 bolt/lib/Profile/YAMLProfileReader.cpp | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp 
b/bolt/lib/Profile/YAMLProfileReader.cpp
index 1a0e5d239d252..91b01a99c7485 100644
--- a/bolt/lib/Profile/YAMLProfileReader.cpp
+++ b/bolt/lib/Profile/YAMLProfileReader.cpp
@@ -29,6 +29,10 @@ static llvm::cl::opt
cl::desc("ignore hash while reading function profile"),
cl::Hidden, cl::cat(BoltOptCategory));
 
+llvm::cl::opt MatchWithCallsAsAnchors("match-with-calls-as-anchors",
+  cl::desc("Matches with calls as anchors"),
+  cl::Hidden, cl::cat(BoltOptCategory));
+
 llvm::cl::opt ProfileUseDFS("profile-use-dfs",
   cl::desc("use DFS order for YAML profile"),
   cl::Hidden, cl::cat(BoltOptCategory));
@@ -353,7 +357,7 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
 llvm_unreachable("Unhandled HashFunction");
   };
 
-  std::unordered_map CallHashToBF;
+  std::unordered_map CallHashToBF;
 
   for (BinaryFunction *BF : BC.getAllBinaryFunctions()) {
 if (ProfiledFunctions.count(BF))
@@ -375,12 +379,12 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
   for (const std::string &FunctionName : FunctionNames)
 HashString.append(FunctionName);
 }
-CallHashToBF.emplace(ComputeCallHash(HashString), BF);
+CallHashToBF[ComputeCallHash(HashString)] = BF;
   }
 
   std::unordered_map ProfiledFunctionIdToName;
 
-  for (const yaml::bolt::BinaryFunctionProfile YamlBF : YamlBP.Functions)
+  for (const yaml::bolt::BinaryFunctionProfile &YamlBF : YamlBP.Functions)
 ProfiledFunctionIdToName[YamlBF.Id] = YamlBF.Name;
 
   for (yaml::bolt::BinaryFunctionProfile &YamlBF : YamlBP.Functions) {
@@ -401,7 +405,7 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
 auto It = CallHashToBF.find(Hash);
 if (It == CallHashToBF.end())
   continue;
-matchProfileToFunction(YamlBF, It->second);
+matchProfileToFunction(YamlBF, *It->second);
 ++MatchedWithCallsAsAnchors;
   }
 }
@@ -480,7 +484,8 @@ Error YAMLProfileReader::readProfile(BinaryContext &BC) {
   matchProfileToFunction(YamlBF, *BF);
 
   uint64_t MatchedWithCallsAsAnchors = 0;
-  matchWithCallsAsAnchors(BC,  MatchedWithCallsAsAnchors);
+  if (opts::MatchWithCallsAsAnchors)
+matchWithCallsAsAnchors(BC,  MatchedWithCallsAsAnchors);
 
   for (yaml::bolt::BinaryFunctionProfile &YamlBF : YamlBP.Functions)
 if (!YamlBF.Used && opts::Verbosity >= 1)

>From ea7cb68ab9e8e158412c2e752986968968a60d93 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Tue, 25 Jun 2024 09:28:39 -0700
Subject: [PATCH 3/9] Changed BF called FunctionNames to multiset

Created using spr 1.3.4
---
 bolt/lib/Profile/YAMLProfileReader.cpp | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp 
b/bolt/lib/Profile/YAMLProfileReader.cpp
index 91b01a99c7485..3b3d73f7af023 100644
--- a/bolt/lib/Profile/YAMLProfileReader.cpp
+++ b/bolt/lib/Profile/YAMLProfileReader.cpp
@@ -365,7 +365,7 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
 
 std::string HashString;
 for (const auto &BB : BF->blocks()) {
-  std::set FunctionNames;
+  std::multiset FunctionNames;
   for (const MCInst &Instr : BB) {
 // Skip non-call instructions.
 if (!BC.MIB->isCall(Instr))
@@ -397,9 +397,8 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
 std::string &FunctionName = ProfiledFunctionIdToName[CallSite.DestId];
 FunctionNames.insert(FunctionName);
   

[llvm-branch-commits] [llvm] [BOLT] Match blocks with calls as anchors (PR #96596)

2024-07-02 Thread Shaw Young via llvm-branch-commits

https://github.com/shawbyoung updated 
https://github.com/llvm/llvm-project/pull/96596

>From 05d59574d6260b98a469921eb2fccf5398bfafb6 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Mon, 24 Jun 2024 23:00:59 -0700
Subject: [PATCH 1/9] Added call to matchWithCallsAsAnchors

Created using spr 1.3.4
---
 bolt/lib/Profile/YAMLProfileReader.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp 
b/bolt/lib/Profile/YAMLProfileReader.cpp
index aafffac3d4b1c..1a0e5d239d252 100644
--- a/bolt/lib/Profile/YAMLProfileReader.cpp
+++ b/bolt/lib/Profile/YAMLProfileReader.cpp
@@ -479,6 +479,9 @@ Error YAMLProfileReader::readProfile(BinaryContext &BC) {
 if (!YamlBF.Used && BF && !ProfiledFunctions.count(BF))
   matchProfileToFunction(YamlBF, *BF);
 
+  uint64_t MatchedWithCallsAsAnchors = 0;
+  matchWithCallsAsAnchors(BC,  MatchedWithCallsAsAnchors);
+
   for (yaml::bolt::BinaryFunctionProfile &YamlBF : YamlBP.Functions)
 if (!YamlBF.Used && opts::Verbosity >= 1)
   errs() << "BOLT-WARNING: profile ignored for function " << YamlBF.Name

>From 77ef0008f4f5987719555e6cc3e32da812ae0f31 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Mon, 24 Jun 2024 23:11:43 -0700
Subject: [PATCH 2/9] Changed CallHashToBF representation

Created using spr 1.3.4
---
 bolt/lib/Profile/YAMLProfileReader.cpp | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp 
b/bolt/lib/Profile/YAMLProfileReader.cpp
index 1a0e5d239d252..91b01a99c7485 100644
--- a/bolt/lib/Profile/YAMLProfileReader.cpp
+++ b/bolt/lib/Profile/YAMLProfileReader.cpp
@@ -29,6 +29,10 @@ static llvm::cl::opt
cl::desc("ignore hash while reading function profile"),
cl::Hidden, cl::cat(BoltOptCategory));
 
+llvm::cl::opt MatchWithCallsAsAnchors("match-with-calls-as-anchors",
+  cl::desc("Matches with calls as anchors"),
+  cl::Hidden, cl::cat(BoltOptCategory));
+
 llvm::cl::opt ProfileUseDFS("profile-use-dfs",
   cl::desc("use DFS order for YAML profile"),
   cl::Hidden, cl::cat(BoltOptCategory));
@@ -353,7 +357,7 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
 llvm_unreachable("Unhandled HashFunction");
   };
 
-  std::unordered_map CallHashToBF;
+  std::unordered_map CallHashToBF;
 
   for (BinaryFunction *BF : BC.getAllBinaryFunctions()) {
 if (ProfiledFunctions.count(BF))
@@ -375,12 +379,12 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
   for (const std::string &FunctionName : FunctionNames)
 HashString.append(FunctionName);
 }
-CallHashToBF.emplace(ComputeCallHash(HashString), BF);
+CallHashToBF[ComputeCallHash(HashString)] = BF;
   }
 
   std::unordered_map ProfiledFunctionIdToName;
 
-  for (const yaml::bolt::BinaryFunctionProfile YamlBF : YamlBP.Functions)
+  for (const yaml::bolt::BinaryFunctionProfile &YamlBF : YamlBP.Functions)
 ProfiledFunctionIdToName[YamlBF.Id] = YamlBF.Name;
 
   for (yaml::bolt::BinaryFunctionProfile &YamlBF : YamlBP.Functions) {
@@ -401,7 +405,7 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
 auto It = CallHashToBF.find(Hash);
 if (It == CallHashToBF.end())
   continue;
-matchProfileToFunction(YamlBF, It->second);
+matchProfileToFunction(YamlBF, *It->second);
 ++MatchedWithCallsAsAnchors;
   }
 }
@@ -480,7 +484,8 @@ Error YAMLProfileReader::readProfile(BinaryContext &BC) {
   matchProfileToFunction(YamlBF, *BF);
 
   uint64_t MatchedWithCallsAsAnchors = 0;
-  matchWithCallsAsAnchors(BC,  MatchedWithCallsAsAnchors);
+  if (opts::MatchWithCallsAsAnchors)
+matchWithCallsAsAnchors(BC,  MatchedWithCallsAsAnchors);
 
   for (yaml::bolt::BinaryFunctionProfile &YamlBF : YamlBP.Functions)
 if (!YamlBF.Used && opts::Verbosity >= 1)

>From ea7cb68ab9e8e158412c2e752986968968a60d93 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Tue, 25 Jun 2024 09:28:39 -0700
Subject: [PATCH 3/9] Changed BF called FunctionNames to multiset

Created using spr 1.3.4
---
 bolt/lib/Profile/YAMLProfileReader.cpp | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp 
b/bolt/lib/Profile/YAMLProfileReader.cpp
index 91b01a99c7485..3b3d73f7af023 100644
--- a/bolt/lib/Profile/YAMLProfileReader.cpp
+++ b/bolt/lib/Profile/YAMLProfileReader.cpp
@@ -365,7 +365,7 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
 
 std::string HashString;
 for (const auto &BB : BF->blocks()) {
-  std::set FunctionNames;
+  std::multiset FunctionNames;
   for (const MCInst &Instr : BB) {
 // Skip non-call instructions.
 if (!BC.MIB->isCall(Instr))
@@ -397,9 +397,8 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
 std::string &FunctionName = ProfiledFunctionIdToName[CallSite.DestId];
 FunctionNames.insert(FunctionName);
   

[llvm-branch-commits] [llvm] [BOLT] Match blocks with calls as anchors (PR #96596)

2024-07-02 Thread Shaw Young via llvm-branch-commits

https://github.com/shawbyoung updated 
https://github.com/llvm/llvm-project/pull/96596

>From 05d59574d6260b98a469921eb2fccf5398bfafb6 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Mon, 24 Jun 2024 23:00:59 -0700
Subject: [PATCH 01/10] Added call to matchWithCallsAsAnchors

Created using spr 1.3.4
---
 bolt/lib/Profile/YAMLProfileReader.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp 
b/bolt/lib/Profile/YAMLProfileReader.cpp
index aafffac3d4b1c..1a0e5d239d252 100644
--- a/bolt/lib/Profile/YAMLProfileReader.cpp
+++ b/bolt/lib/Profile/YAMLProfileReader.cpp
@@ -479,6 +479,9 @@ Error YAMLProfileReader::readProfile(BinaryContext &BC) {
 if (!YamlBF.Used && BF && !ProfiledFunctions.count(BF))
   matchProfileToFunction(YamlBF, *BF);
 
+  uint64_t MatchedWithCallsAsAnchors = 0;
+  matchWithCallsAsAnchors(BC,  MatchedWithCallsAsAnchors);
+
   for (yaml::bolt::BinaryFunctionProfile &YamlBF : YamlBP.Functions)
 if (!YamlBF.Used && opts::Verbosity >= 1)
   errs() << "BOLT-WARNING: profile ignored for function " << YamlBF.Name

>From 77ef0008f4f5987719555e6cc3e32da812ae0f31 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Mon, 24 Jun 2024 23:11:43 -0700
Subject: [PATCH 02/10] Changed CallHashToBF representation

Created using spr 1.3.4
---
 bolt/lib/Profile/YAMLProfileReader.cpp | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp 
b/bolt/lib/Profile/YAMLProfileReader.cpp
index 1a0e5d239d252..91b01a99c7485 100644
--- a/bolt/lib/Profile/YAMLProfileReader.cpp
+++ b/bolt/lib/Profile/YAMLProfileReader.cpp
@@ -29,6 +29,10 @@ static llvm::cl::opt
cl::desc("ignore hash while reading function profile"),
cl::Hidden, cl::cat(BoltOptCategory));
 
+llvm::cl::opt MatchWithCallsAsAnchors("match-with-calls-as-anchors",
+  cl::desc("Matches with calls as anchors"),
+  cl::Hidden, cl::cat(BoltOptCategory));
+
 llvm::cl::opt ProfileUseDFS("profile-use-dfs",
   cl::desc("use DFS order for YAML profile"),
   cl::Hidden, cl::cat(BoltOptCategory));
@@ -353,7 +357,7 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
 llvm_unreachable("Unhandled HashFunction");
   };
 
-  std::unordered_map CallHashToBF;
+  std::unordered_map CallHashToBF;
 
   for (BinaryFunction *BF : BC.getAllBinaryFunctions()) {
 if (ProfiledFunctions.count(BF))
@@ -375,12 +379,12 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
   for (const std::string &FunctionName : FunctionNames)
 HashString.append(FunctionName);
 }
-CallHashToBF.emplace(ComputeCallHash(HashString), BF);
+CallHashToBF[ComputeCallHash(HashString)] = BF;
   }
 
   std::unordered_map ProfiledFunctionIdToName;
 
-  for (const yaml::bolt::BinaryFunctionProfile YamlBF : YamlBP.Functions)
+  for (const yaml::bolt::BinaryFunctionProfile &YamlBF : YamlBP.Functions)
 ProfiledFunctionIdToName[YamlBF.Id] = YamlBF.Name;
 
   for (yaml::bolt::BinaryFunctionProfile &YamlBF : YamlBP.Functions) {
@@ -401,7 +405,7 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
 auto It = CallHashToBF.find(Hash);
 if (It == CallHashToBF.end())
   continue;
-matchProfileToFunction(YamlBF, It->second);
+matchProfileToFunction(YamlBF, *It->second);
 ++MatchedWithCallsAsAnchors;
   }
 }
@@ -480,7 +484,8 @@ Error YAMLProfileReader::readProfile(BinaryContext &BC) {
   matchProfileToFunction(YamlBF, *BF);
 
   uint64_t MatchedWithCallsAsAnchors = 0;
-  matchWithCallsAsAnchors(BC,  MatchedWithCallsAsAnchors);
+  if (opts::MatchWithCallsAsAnchors)
+matchWithCallsAsAnchors(BC,  MatchedWithCallsAsAnchors);
 
   for (yaml::bolt::BinaryFunctionProfile &YamlBF : YamlBP.Functions)
 if (!YamlBF.Used && opts::Verbosity >= 1)

>From ea7cb68ab9e8e158412c2e752986968968a60d93 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Tue, 25 Jun 2024 09:28:39 -0700
Subject: [PATCH 03/10] Changed BF called FunctionNames to multiset

Created using spr 1.3.4
---
 bolt/lib/Profile/YAMLProfileReader.cpp | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp 
b/bolt/lib/Profile/YAMLProfileReader.cpp
index 91b01a99c7485..3b3d73f7af023 100644
--- a/bolt/lib/Profile/YAMLProfileReader.cpp
+++ b/bolt/lib/Profile/YAMLProfileReader.cpp
@@ -365,7 +365,7 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
 
 std::string HashString;
 for (const auto &BB : BF->blocks()) {
-  std::set FunctionNames;
+  std::multiset FunctionNames;
   for (const MCInst &Instr : BB) {
 // Skip non-call instructions.
 if (!BC.MIB->isCall(Instr))
@@ -397,9 +397,8 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
 std::string &FunctionName = ProfiledFunctionIdToName[CallSite.DestId];
 FunctionNames.insert(FunctionName);
 

[llvm-branch-commits] [mlir] 6d4ac22 - Revert "Fix block merging (#96871)"

2024-07-02 Thread via llvm-branch-commits

Author: Mehdi Amini
Date: 2024-07-02T20:56:56+02:00
New Revision: 6d4ac22a2a982473a37fbca529296db5a8b5efc2

URL: 
https://github.com/llvm/llvm-project/commit/6d4ac22a2a982473a37fbca529296db5a8b5efc2
DIFF: 
https://github.com/llvm/llvm-project/commit/6d4ac22a2a982473a37fbca529296db5a8b5efc2.diff

LOG: Revert "Fix block merging (#96871)"

This reverts commit 6c3897d90eda4c39789ac9f4efa51db46734a249.

Added: 


Modified: 

mlir/lib/Dialect/Bufferization/Transforms/BufferDeallocationSimplification.cpp
mlir/lib/Transforms/Utils/RegionUtils.cpp

mlir/test/Dialect/Bufferization/Transforms/OwnershipBasedBufferDeallocation/dealloc-branchop-interface.mlir
mlir/test/Dialect/Linalg/detensorize_entry_block.mlir
mlir/test/Dialect/Linalg/detensorize_if.mlir
mlir/test/Dialect/Linalg/detensorize_while.mlir
mlir/test/Dialect/Linalg/detensorize_while_impure_cf.mlir
mlir/test/Dialect/Linalg/detensorize_while_pure_cf.mlir
mlir/test/Transforms/canonicalize-block-merge.mlir
mlir/test/Transforms/canonicalize-dce.mlir
mlir/test/Transforms/make-isolated-from-above.mlir

Removed: 
mlir/test/Transforms/test-canonicalize-merge-large-blocks.mlir



diff  --git 
a/mlir/lib/Dialect/Bufferization/Transforms/BufferDeallocationSimplification.cpp
 
b/mlir/lib/Dialect/Bufferization/Transforms/BufferDeallocationSimplification.cpp
index 5227b22653eef..954485cfede3d 100644
--- 
a/mlir/lib/Dialect/Bufferization/Transforms/BufferDeallocationSimplification.cpp
+++ 
b/mlir/lib/Dialect/Bufferization/Transforms/BufferDeallocationSimplification.cpp
@@ -463,15 +463,10 @@ struct BufferDeallocationSimplificationPass
  SplitDeallocWhenNotAliasingAnyOther,
  RetainedMemrefAliasingAlwaysDeallocatedMemref>(&getContext(),
 analysis);
-// We don't want that the block structure changes invalidating the
-// `BufferOriginAnalysis` so we apply the rewrites witha `Normal` level of
-// region simplification
-GreedyRewriteConfig config;
-config.enableRegionSimplification = GreedySimplifyRegionLevel::Normal;
 populateDeallocOpCanonicalizationPatterns(patterns, &getContext());
 
-if (failed(applyPatternsAndFoldGreedily(getOperation(), 
std::move(patterns),
-config)))
+if (failed(
+applyPatternsAndFoldGreedily(getOperation(), std::move(patterns
   signalPassFailure();
   }
 };

diff  --git a/mlir/lib/Transforms/Utils/RegionUtils.cpp 
b/mlir/lib/Transforms/Utils/RegionUtils.cpp
index 412e2456295ad..4c0f15bafbaba 100644
--- a/mlir/lib/Transforms/Utils/RegionUtils.cpp
+++ b/mlir/lib/Transforms/Utils/RegionUtils.cpp
@@ -9,7 +9,6 @@
 #include "mlir/Transforms/RegionUtils.h"
 #include "mlir/Analysis/TopologicalSortUtils.h"
 #include "mlir/IR/Block.h"
-#include "mlir/IR/BuiltinOps.h"
 #include "mlir/IR/IRMapping.h"
 #include "mlir/IR/Operation.h"
 #include "mlir/IR/PatternMatch.h"
@@ -17,15 +16,11 @@
 #include "mlir/IR/Value.h"
 #include "mlir/Interfaces/ControlFlowInterfaces.h"
 #include "mlir/Interfaces/SideEffectInterfaces.h"
-#include "mlir/Support/LogicalResult.h"
 
 #include "llvm/ADT/DepthFirstIterator.h"
 #include "llvm/ADT/PostOrderIterator.h"
-#include "llvm/ADT/STLExtras.h"
-#include "llvm/ADT/SmallSet.h"
 
 #include 
-#include 
 
 using namespace mlir;
 
@@ -704,8 +699,9 @@ LogicalResult BlockMergeCluster::merge(RewriterBase 
&rewriter) {
   blockIterators.push_back(mergeBlock->begin());
 
 // Update each of the predecessor terminators with the new arguments.
-SmallVector, 2> newArguments(1 + 
blocksToMerge.size(),
-   SmallVector());
+SmallVector, 2> newArguments(
+1 + blocksToMerge.size(),
+SmallVector(operandsToMerge.size()));
 unsigned curOpIndex = 0;
 for (const auto &it : llvm::enumerate(operandsToMerge)) {
   unsigned nextOpOffset = it.value().first - curOpIndex;
@@ -716,22 +712,13 @@ LogicalResult BlockMergeCluster::merge(RewriterBase 
&rewriter) {
 Block::iterator &blockIter = blockIterators[i];
 std::advance(blockIter, nextOpOffset);
 auto &operand = blockIter->getOpOperand(it.value().second);
-Value operandVal = operand.get();
-Value *it = std::find(newArguments[i].begin(), newArguments[i].end(),
-  operandVal);
-if (it == newArguments[i].end()) {
-  newArguments[i].push_back(operandVal);
-  // Update the operand and insert an argument if this is the leader.
-  if (i == 0) {
-operand.set(leaderBlock->addArgument(operandVal.getType(),
- operandVal.getLoc()));
-  }
-} else if (i == 0) {
-  // If this is the leader, update the operand but do not insert a new
- 

[llvm-branch-commits] [flang] [Flang][OpenMP] Derived type explicit allocatable member mapping (PR #96266)

2024-07-02 Thread via llvm-branch-commits


@@ -216,31 +215,50 @@ bool 
ClauseProcessor::processMotionClauses(lower::StatementContext &stmtCtx,
   if (origSymbol && fir::isTypeWithDescriptor(origSymbol.getType()))
 symAddr = origSymbol;
 
+  if (object.sym()->owner().IsDerivedType()) {
+omp::ObjectList objectList = gatherObjects(object, semaCtx);
+parentObj = objectList[0];
+parentMemberIndices.insert({parentObj.value(), {}});
+if (Fortran::semantics::IsAllocatableOrObjectPointer(
+object.sym()) ||
+memberHasAllocatableParent(object, semaCtx)) {
+  llvm::SmallVector indices =
+  generateMemberPlacementIndices(object, semaCtx);
+  symAddr = createParentSymAndGenIntermediateMaps(
+  clauseLocation, converter, objectList, indices,
+  parentMemberIndices[parentObj.value()], asFortran.str(),
+  mapTypeBits);
+}
+  }
+

agozillon wrote:

Thank you! It's intentional unfortunately. I will create a PR after this lands 
to align the two functions, so replication isn't required anymore. They're two 
seperate types of map clauses (motion clauses and regular map clauses, with 
motion clauses being a subset of the map clauses) that appear on different 
directives. When the motion clauses were added I think it was kept seperate 
incase it had any special considerations, but as we've been moving forward with 
the map work it's just been replicating the code across both. 

So merging them into one is something I'd very much like to do. It might be out 
of the scope of this PR though. So I am happy to do it in a follow up PR once 
this lands!

https://github.com/llvm/llvm-project/pull/96266
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64] Only create called thunks when hardening against SLS (PR #97472)

2024-07-02 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko created 
https://github.com/llvm/llvm-project/pull/97472

In preparation for implementing hardening of BLRA* instructions, restrict thunk 
function generation to only the thunks being actually called from any function. 
As described in the existing comments, emitting all possible thunks for BLRAA 
and BLRAB instructions would mean adding about 1800 functions in total, most of 
which are likely not to be called.

This commit merges AArch64SLSHardening class into SLSBLRThunkInserter, so 
thunks can be created as needed while rewriting a machine function. The usages 
of TII, TRI and ST fields of AArch64SLSHardening class are replaced with 
requesting them in-place, as ThunkInserter assumes multiple "entry points" in 
contrast to the only runOnMachineFunction method of AArch64SLSHardening.

The runOnMachineFunction method essentially replaces pre-existing insertThunks 
implementation as there is no more need to insert all possible thunks 
unconditionally. Instead, thunks are created on first use from inside of 
insertThunks method.

>From a246cfe705b326c520d6b36882a17bd90b622e5d Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Fri, 28 Jun 2024 21:50:24 +0300
Subject: [PATCH] [AArch64] Only create called thunks when hardening against
 SLS

In preparation for implementing hardening of BLRA* instructions,
restrict thunk function generation to only the thunks being actually
called from any function. As described in the existing comments,
emitting all possible thunks for BLRAA and BLRAB instructions would
mean adding about 1800 functions in total, most of which are likely
not to be called.

This commit merges AArch64SLSHardening class into SLSBLRThunkInserter,
so thunks can be created as needed while rewriting a machine function.
The usages of TII, TRI and ST fields of AArch64SLSHardening class are
replaced with requesting them in-place, as ThunkInserter assumes
multiple "entry points" in contrast to the only runOnMachineFunction
method of AArch64SLSHardening.

The runOnMachineFunction method essentially replaces pre-existing
insertThunks implementation as there is no more need to insert all
possible thunks unconditionally. Instead, thunks are created on first
use from inside of insertThunks method.
---
 llvm/lib/Target/AArch64/AArch64.h |   1 -
 .../Target/AArch64/AArch64SLSHardening.cpp| 188 +++---
 .../Target/AArch64/AArch64TargetMachine.cpp   |   1 -
 llvm/test/CodeGen/AArch64/O0-pipeline.ll  |   1 -
 llvm/test/CodeGen/AArch64/O3-pipeline.ll  |   1 -
 .../AArch64/arm64-opt-remarks-lazy-bfi.ll |   8 -
 .../speculation-hardening-sls-blr-bti.mir |  20 --
 .../AArch64/speculation-hardening-sls-blr.mir |  20 +-
 8 files changed, 75 insertions(+), 165 deletions(-)

diff --git a/llvm/lib/Target/AArch64/AArch64.h 
b/llvm/lib/Target/AArch64/AArch64.h
index 6f2aeb83a451a..66ad701d83958 100644
--- a/llvm/lib/Target/AArch64/AArch64.h
+++ b/llvm/lib/Target/AArch64/AArch64.h
@@ -40,7 +40,6 @@ FunctionPass *createAArch64ISelDag(AArch64TargetMachine &TM,
 FunctionPass *createAArch64StorePairSuppressPass();
 FunctionPass *createAArch64ExpandPseudoPass();
 FunctionPass *createAArch64SLSHardeningPass();
-FunctionPass *createAArch64IndirectThunks();
 FunctionPass *createAArch64SpeculationHardeningPass();
 FunctionPass *createAArch64LoadStoreOptimizationPass();
 ModulePass *createAArch64LowerHomogeneousPrologEpilogPass();
diff --git a/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp 
b/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp
index d35386eaeab12..b4ebd7d5377c2 100644
--- a/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp
+++ b/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp
@@ -13,20 +13,16 @@
 
 #include "AArch64InstrInfo.h"
 #include "AArch64Subtarget.h"
-#include "Utils/AArch64BaseInfo.h"
 #include "llvm/CodeGen/IndirectThunks.h"
 #include "llvm/CodeGen/MachineBasicBlock.h"
 #include "llvm/CodeGen/MachineFunction.h"
-#include "llvm/CodeGen/MachineFunctionPass.h"
 #include "llvm/CodeGen/MachineInstr.h"
 #include "llvm/CodeGen/MachineInstrBuilder.h"
 #include "llvm/CodeGen/MachineOperand.h"
-#include "llvm/CodeGen/MachineRegisterInfo.h"
 #include "llvm/CodeGen/RegisterScavenging.h"
 #include "llvm/IR/DebugLoc.h"
 #include "llvm/Pass.h"
-#include "llvm/Support/CodeGen.h"
-#include "llvm/Support/Debug.h"
+#include "llvm/Support/ErrorHandling.h"
 #include "llvm/Target/TargetMachine.h"
 #include 
 
@@ -36,38 +32,43 @@ using namespace llvm;
 
 #define AARCH64_SLS_HARDENING_NAME "AArch64 sls hardening pass"
 
+static const char SLSBLRNamePrefix[] = "__llvm_slsblr_thunk_";
+
 namespace {
 
-class AArch64SLSHardening : public MachineFunctionPass {
-public:
-  const TargetInstrInfo *TII;
-  const TargetRegisterInfo *TRI;
-  const AArch64Subtarget *ST;
+// Set of inserted thunks: bitmask with bits corresponding to
+// indexes in SLSBLRThunks array.
+typedef uint32_t ThunksSet;
 
-  static char ID;
-
-  AArch64SLSHardening() : MachineFunctionPass(ID) {
-

[llvm-branch-commits] [llvm] [AArch64] Only create called thunks when hardening against SLS (PR #97472)

2024-07-02 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-aarch64

Author: Anatoly Trosinenko (atrosinenko)


Changes

In preparation for implementing hardening of BLRA* instructions, restrict thunk 
function generation to only the thunks being actually called from any function. 
As described in the existing comments, emitting all possible thunks for BLRAA 
and BLRAB instructions would mean adding about 1800 functions in total, most of 
which are likely not to be called.

This commit merges AArch64SLSHardening class into SLSBLRThunkInserter, so 
thunks can be created as needed while rewriting a machine function. The usages 
of TII, TRI and ST fields of AArch64SLSHardening class are replaced with 
requesting them in-place, as ThunkInserter assumes multiple "entry points" in 
contrast to the only runOnMachineFunction method of AArch64SLSHardening.

The runOnMachineFunction method essentially replaces pre-existing insertThunks 
implementation as there is no more need to insert all possible thunks 
unconditionally. Instead, thunks are created on first use from inside of 
insertThunks method.

---
Full diff: https://github.com/llvm/llvm-project/pull/97472.diff


8 Files Affected:

- (modified) llvm/lib/Target/AArch64/AArch64.h (-1) 
- (modified) llvm/lib/Target/AArch64/AArch64SLSHardening.cpp (+72-116) 
- (modified) llvm/lib/Target/AArch64/AArch64TargetMachine.cpp (-1) 
- (modified) llvm/test/CodeGen/AArch64/O0-pipeline.ll (-1) 
- (modified) llvm/test/CodeGen/AArch64/O3-pipeline.ll (-1) 
- (modified) llvm/test/CodeGen/AArch64/arm64-opt-remarks-lazy-bfi.ll (-8) 
- (modified) llvm/test/CodeGen/AArch64/speculation-hardening-sls-blr-bti.mir 
(-20) 
- (modified) llvm/test/CodeGen/AArch64/speculation-hardening-sls-blr.mir 
(+3-17) 


``diff
diff --git a/llvm/lib/Target/AArch64/AArch64.h 
b/llvm/lib/Target/AArch64/AArch64.h
index 6f2aeb83a451a..66ad701d83958 100644
--- a/llvm/lib/Target/AArch64/AArch64.h
+++ b/llvm/lib/Target/AArch64/AArch64.h
@@ -40,7 +40,6 @@ FunctionPass *createAArch64ISelDag(AArch64TargetMachine &TM,
 FunctionPass *createAArch64StorePairSuppressPass();
 FunctionPass *createAArch64ExpandPseudoPass();
 FunctionPass *createAArch64SLSHardeningPass();
-FunctionPass *createAArch64IndirectThunks();
 FunctionPass *createAArch64SpeculationHardeningPass();
 FunctionPass *createAArch64LoadStoreOptimizationPass();
 ModulePass *createAArch64LowerHomogeneousPrologEpilogPass();
diff --git a/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp 
b/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp
index d35386eaeab12..b4ebd7d5377c2 100644
--- a/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp
+++ b/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp
@@ -13,20 +13,16 @@
 
 #include "AArch64InstrInfo.h"
 #include "AArch64Subtarget.h"
-#include "Utils/AArch64BaseInfo.h"
 #include "llvm/CodeGen/IndirectThunks.h"
 #include "llvm/CodeGen/MachineBasicBlock.h"
 #include "llvm/CodeGen/MachineFunction.h"
-#include "llvm/CodeGen/MachineFunctionPass.h"
 #include "llvm/CodeGen/MachineInstr.h"
 #include "llvm/CodeGen/MachineInstrBuilder.h"
 #include "llvm/CodeGen/MachineOperand.h"
-#include "llvm/CodeGen/MachineRegisterInfo.h"
 #include "llvm/CodeGen/RegisterScavenging.h"
 #include "llvm/IR/DebugLoc.h"
 #include "llvm/Pass.h"
-#include "llvm/Support/CodeGen.h"
-#include "llvm/Support/Debug.h"
+#include "llvm/Support/ErrorHandling.h"
 #include "llvm/Target/TargetMachine.h"
 #include 
 
@@ -36,38 +32,43 @@ using namespace llvm;
 
 #define AARCH64_SLS_HARDENING_NAME "AArch64 sls hardening pass"
 
+static const char SLSBLRNamePrefix[] = "__llvm_slsblr_thunk_";
+
 namespace {
 
-class AArch64SLSHardening : public MachineFunctionPass {
-public:
-  const TargetInstrInfo *TII;
-  const TargetRegisterInfo *TRI;
-  const AArch64Subtarget *ST;
+// Set of inserted thunks: bitmask with bits corresponding to
+// indexes in SLSBLRThunks array.
+typedef uint32_t ThunksSet;
 
-  static char ID;
-
-  AArch64SLSHardening() : MachineFunctionPass(ID) {
-initializeAArch64SLSHardeningPass(*PassRegistry::getPassRegistry());
+struct SLSBLRThunkInserter : ThunkInserter {
+public:
+  const char *getThunkPrefix() { return SLSBLRNamePrefix; }
+  bool mayUseThunk(const MachineFunction &MF) {
+// FIXME: ComdatThunks is only accumulated until the first thunk is 
created.
+ComdatThunks &= !MF.getSubtarget().hardenSlsNoComdat();
+// We are inserting barriers aside from thunk calls, so
+// check hardenSlsRetBr() as well.
+return MF.getSubtarget().hardenSlsBlr() ||
+   MF.getSubtarget().hardenSlsRetBr();
   }
+  ThunksSet insertThunks(MachineModuleInfo &MMI, MachineFunction &MF,
+ ThunksSet ExistingThunks);
+  void populateThunk(MachineFunction &MF);
 
-  bool runOnMachineFunction(MachineFunction &Fn) override;
+private:
+  bool ComdatThunks = true;
 
-  StringRef getPassName() const override { return AARCH64_SLS_HARDENING_NAME; }
+  bool hardenReturnsAndBRs(MachineModuleInfo &MMI, MachineBasicBlock &MBB);
+  bool hardenB

[llvm-branch-commits] [llvm] [BOLT] Match blocks with calls as anchors (PR #96596)

2024-07-02 Thread Shaw Young via llvm-branch-commits

https://github.com/shawbyoung updated 
https://github.com/llvm/llvm-project/pull/96596

>From 05d59574d6260b98a469921eb2fccf5398bfafb6 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Mon, 24 Jun 2024 23:00:59 -0700
Subject: [PATCH 01/11] Added call to matchWithCallsAsAnchors

Created using spr 1.3.4
---
 bolt/lib/Profile/YAMLProfileReader.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp 
b/bolt/lib/Profile/YAMLProfileReader.cpp
index aafffac3d4b1c..1a0e5d239d252 100644
--- a/bolt/lib/Profile/YAMLProfileReader.cpp
+++ b/bolt/lib/Profile/YAMLProfileReader.cpp
@@ -479,6 +479,9 @@ Error YAMLProfileReader::readProfile(BinaryContext &BC) {
 if (!YamlBF.Used && BF && !ProfiledFunctions.count(BF))
   matchProfileToFunction(YamlBF, *BF);
 
+  uint64_t MatchedWithCallsAsAnchors = 0;
+  matchWithCallsAsAnchors(BC,  MatchedWithCallsAsAnchors);
+
   for (yaml::bolt::BinaryFunctionProfile &YamlBF : YamlBP.Functions)
 if (!YamlBF.Used && opts::Verbosity >= 1)
   errs() << "BOLT-WARNING: profile ignored for function " << YamlBF.Name

>From 77ef0008f4f5987719555e6cc3e32da812ae0f31 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Mon, 24 Jun 2024 23:11:43 -0700
Subject: [PATCH 02/11] Changed CallHashToBF representation

Created using spr 1.3.4
---
 bolt/lib/Profile/YAMLProfileReader.cpp | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp 
b/bolt/lib/Profile/YAMLProfileReader.cpp
index 1a0e5d239d252..91b01a99c7485 100644
--- a/bolt/lib/Profile/YAMLProfileReader.cpp
+++ b/bolt/lib/Profile/YAMLProfileReader.cpp
@@ -29,6 +29,10 @@ static llvm::cl::opt
cl::desc("ignore hash while reading function profile"),
cl::Hidden, cl::cat(BoltOptCategory));
 
+llvm::cl::opt MatchWithCallsAsAnchors("match-with-calls-as-anchors",
+  cl::desc("Matches with calls as anchors"),
+  cl::Hidden, cl::cat(BoltOptCategory));
+
 llvm::cl::opt ProfileUseDFS("profile-use-dfs",
   cl::desc("use DFS order for YAML profile"),
   cl::Hidden, cl::cat(BoltOptCategory));
@@ -353,7 +357,7 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
 llvm_unreachable("Unhandled HashFunction");
   };
 
-  std::unordered_map CallHashToBF;
+  std::unordered_map CallHashToBF;
 
   for (BinaryFunction *BF : BC.getAllBinaryFunctions()) {
 if (ProfiledFunctions.count(BF))
@@ -375,12 +379,12 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
   for (const std::string &FunctionName : FunctionNames)
 HashString.append(FunctionName);
 }
-CallHashToBF.emplace(ComputeCallHash(HashString), BF);
+CallHashToBF[ComputeCallHash(HashString)] = BF;
   }
 
   std::unordered_map ProfiledFunctionIdToName;
 
-  for (const yaml::bolt::BinaryFunctionProfile YamlBF : YamlBP.Functions)
+  for (const yaml::bolt::BinaryFunctionProfile &YamlBF : YamlBP.Functions)
 ProfiledFunctionIdToName[YamlBF.Id] = YamlBF.Name;
 
   for (yaml::bolt::BinaryFunctionProfile &YamlBF : YamlBP.Functions) {
@@ -401,7 +405,7 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
 auto It = CallHashToBF.find(Hash);
 if (It == CallHashToBF.end())
   continue;
-matchProfileToFunction(YamlBF, It->second);
+matchProfileToFunction(YamlBF, *It->second);
 ++MatchedWithCallsAsAnchors;
   }
 }
@@ -480,7 +484,8 @@ Error YAMLProfileReader::readProfile(BinaryContext &BC) {
   matchProfileToFunction(YamlBF, *BF);
 
   uint64_t MatchedWithCallsAsAnchors = 0;
-  matchWithCallsAsAnchors(BC,  MatchedWithCallsAsAnchors);
+  if (opts::MatchWithCallsAsAnchors)
+matchWithCallsAsAnchors(BC,  MatchedWithCallsAsAnchors);
 
   for (yaml::bolt::BinaryFunctionProfile &YamlBF : YamlBP.Functions)
 if (!YamlBF.Used && opts::Verbosity >= 1)

>From ea7cb68ab9e8e158412c2e752986968968a60d93 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Tue, 25 Jun 2024 09:28:39 -0700
Subject: [PATCH 03/11] Changed BF called FunctionNames to multiset

Created using spr 1.3.4
---
 bolt/lib/Profile/YAMLProfileReader.cpp | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp 
b/bolt/lib/Profile/YAMLProfileReader.cpp
index 91b01a99c7485..3b3d73f7af023 100644
--- a/bolt/lib/Profile/YAMLProfileReader.cpp
+++ b/bolt/lib/Profile/YAMLProfileReader.cpp
@@ -365,7 +365,7 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
 
 std::string HashString;
 for (const auto &BB : BF->blocks()) {
-  std::set FunctionNames;
+  std::multiset FunctionNames;
   for (const MCInst &Instr : BB) {
 // Skip non-call instructions.
 if (!BC.MIB->isCall(Instr))
@@ -397,9 +397,8 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
 std::string &FunctionName = ProfiledFunctionIdToName[CallSite.DestId];
 FunctionNames.insert(FunctionName);
 

[llvm-branch-commits] [llvm] [BOLT] Match blocks with calls as anchors (PR #96596)

2024-07-02 Thread Shaw Young via llvm-branch-commits

https://github.com/shawbyoung updated 
https://github.com/llvm/llvm-project/pull/96596

>From 05d59574d6260b98a469921eb2fccf5398bfafb6 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Mon, 24 Jun 2024 23:00:59 -0700
Subject: [PATCH 01/12] Added call to matchWithCallsAsAnchors

Created using spr 1.3.4
---
 bolt/lib/Profile/YAMLProfileReader.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp 
b/bolt/lib/Profile/YAMLProfileReader.cpp
index aafffac3d4b1c..1a0e5d239d252 100644
--- a/bolt/lib/Profile/YAMLProfileReader.cpp
+++ b/bolt/lib/Profile/YAMLProfileReader.cpp
@@ -479,6 +479,9 @@ Error YAMLProfileReader::readProfile(BinaryContext &BC) {
 if (!YamlBF.Used && BF && !ProfiledFunctions.count(BF))
   matchProfileToFunction(YamlBF, *BF);
 
+  uint64_t MatchedWithCallsAsAnchors = 0;
+  matchWithCallsAsAnchors(BC,  MatchedWithCallsAsAnchors);
+
   for (yaml::bolt::BinaryFunctionProfile &YamlBF : YamlBP.Functions)
 if (!YamlBF.Used && opts::Verbosity >= 1)
   errs() << "BOLT-WARNING: profile ignored for function " << YamlBF.Name

>From 77ef0008f4f5987719555e6cc3e32da812ae0f31 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Mon, 24 Jun 2024 23:11:43 -0700
Subject: [PATCH 02/12] Changed CallHashToBF representation

Created using spr 1.3.4
---
 bolt/lib/Profile/YAMLProfileReader.cpp | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp 
b/bolt/lib/Profile/YAMLProfileReader.cpp
index 1a0e5d239d252..91b01a99c7485 100644
--- a/bolt/lib/Profile/YAMLProfileReader.cpp
+++ b/bolt/lib/Profile/YAMLProfileReader.cpp
@@ -29,6 +29,10 @@ static llvm::cl::opt
cl::desc("ignore hash while reading function profile"),
cl::Hidden, cl::cat(BoltOptCategory));
 
+llvm::cl::opt MatchWithCallsAsAnchors("match-with-calls-as-anchors",
+  cl::desc("Matches with calls as anchors"),
+  cl::Hidden, cl::cat(BoltOptCategory));
+
 llvm::cl::opt ProfileUseDFS("profile-use-dfs",
   cl::desc("use DFS order for YAML profile"),
   cl::Hidden, cl::cat(BoltOptCategory));
@@ -353,7 +357,7 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
 llvm_unreachable("Unhandled HashFunction");
   };
 
-  std::unordered_map CallHashToBF;
+  std::unordered_map CallHashToBF;
 
   for (BinaryFunction *BF : BC.getAllBinaryFunctions()) {
 if (ProfiledFunctions.count(BF))
@@ -375,12 +379,12 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
   for (const std::string &FunctionName : FunctionNames)
 HashString.append(FunctionName);
 }
-CallHashToBF.emplace(ComputeCallHash(HashString), BF);
+CallHashToBF[ComputeCallHash(HashString)] = BF;
   }
 
   std::unordered_map ProfiledFunctionIdToName;
 
-  for (const yaml::bolt::BinaryFunctionProfile YamlBF : YamlBP.Functions)
+  for (const yaml::bolt::BinaryFunctionProfile &YamlBF : YamlBP.Functions)
 ProfiledFunctionIdToName[YamlBF.Id] = YamlBF.Name;
 
   for (yaml::bolt::BinaryFunctionProfile &YamlBF : YamlBP.Functions) {
@@ -401,7 +405,7 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
 auto It = CallHashToBF.find(Hash);
 if (It == CallHashToBF.end())
   continue;
-matchProfileToFunction(YamlBF, It->second);
+matchProfileToFunction(YamlBF, *It->second);
 ++MatchedWithCallsAsAnchors;
   }
 }
@@ -480,7 +484,8 @@ Error YAMLProfileReader::readProfile(BinaryContext &BC) {
   matchProfileToFunction(YamlBF, *BF);
 
   uint64_t MatchedWithCallsAsAnchors = 0;
-  matchWithCallsAsAnchors(BC,  MatchedWithCallsAsAnchors);
+  if (opts::MatchWithCallsAsAnchors)
+matchWithCallsAsAnchors(BC,  MatchedWithCallsAsAnchors);
 
   for (yaml::bolt::BinaryFunctionProfile &YamlBF : YamlBP.Functions)
 if (!YamlBF.Used && opts::Verbosity >= 1)

>From ea7cb68ab9e8e158412c2e752986968968a60d93 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Tue, 25 Jun 2024 09:28:39 -0700
Subject: [PATCH 03/12] Changed BF called FunctionNames to multiset

Created using spr 1.3.4
---
 bolt/lib/Profile/YAMLProfileReader.cpp | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp 
b/bolt/lib/Profile/YAMLProfileReader.cpp
index 91b01a99c7485..3b3d73f7af023 100644
--- a/bolt/lib/Profile/YAMLProfileReader.cpp
+++ b/bolt/lib/Profile/YAMLProfileReader.cpp
@@ -365,7 +365,7 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
 
 std::string HashString;
 for (const auto &BB : BF->blocks()) {
-  std::set FunctionNames;
+  std::multiset FunctionNames;
   for (const MCInst &Instr : BB) {
 // Skip non-call instructions.
 if (!BC.MIB->isCall(Instr))
@@ -397,9 +397,8 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
 std::string &FunctionName = ProfiledFunctionIdToName[CallSite.DestId];
 FunctionNames.insert(FunctionName);
 

[llvm-branch-commits] [llvm] [BOLT] Match blocks with calls as anchors (PR #96596)

2024-07-02 Thread Shaw Young via llvm-branch-commits

https://github.com/shawbyoung updated 
https://github.com/llvm/llvm-project/pull/96596

>From 05d59574d6260b98a469921eb2fccf5398bfafb6 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Mon, 24 Jun 2024 23:00:59 -0700
Subject: [PATCH 01/13] Added call to matchWithCallsAsAnchors

Created using spr 1.3.4
---
 bolt/lib/Profile/YAMLProfileReader.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp 
b/bolt/lib/Profile/YAMLProfileReader.cpp
index aafffac3d4b1c..1a0e5d239d252 100644
--- a/bolt/lib/Profile/YAMLProfileReader.cpp
+++ b/bolt/lib/Profile/YAMLProfileReader.cpp
@@ -479,6 +479,9 @@ Error YAMLProfileReader::readProfile(BinaryContext &BC) {
 if (!YamlBF.Used && BF && !ProfiledFunctions.count(BF))
   matchProfileToFunction(YamlBF, *BF);
 
+  uint64_t MatchedWithCallsAsAnchors = 0;
+  matchWithCallsAsAnchors(BC,  MatchedWithCallsAsAnchors);
+
   for (yaml::bolt::BinaryFunctionProfile &YamlBF : YamlBP.Functions)
 if (!YamlBF.Used && opts::Verbosity >= 1)
   errs() << "BOLT-WARNING: profile ignored for function " << YamlBF.Name

>From 77ef0008f4f5987719555e6cc3e32da812ae0f31 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Mon, 24 Jun 2024 23:11:43 -0700
Subject: [PATCH 02/13] Changed CallHashToBF representation

Created using spr 1.3.4
---
 bolt/lib/Profile/YAMLProfileReader.cpp | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp 
b/bolt/lib/Profile/YAMLProfileReader.cpp
index 1a0e5d239d252..91b01a99c7485 100644
--- a/bolt/lib/Profile/YAMLProfileReader.cpp
+++ b/bolt/lib/Profile/YAMLProfileReader.cpp
@@ -29,6 +29,10 @@ static llvm::cl::opt
cl::desc("ignore hash while reading function profile"),
cl::Hidden, cl::cat(BoltOptCategory));
 
+llvm::cl::opt MatchWithCallsAsAnchors("match-with-calls-as-anchors",
+  cl::desc("Matches with calls as anchors"),
+  cl::Hidden, cl::cat(BoltOptCategory));
+
 llvm::cl::opt ProfileUseDFS("profile-use-dfs",
   cl::desc("use DFS order for YAML profile"),
   cl::Hidden, cl::cat(BoltOptCategory));
@@ -353,7 +357,7 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
 llvm_unreachable("Unhandled HashFunction");
   };
 
-  std::unordered_map CallHashToBF;
+  std::unordered_map CallHashToBF;
 
   for (BinaryFunction *BF : BC.getAllBinaryFunctions()) {
 if (ProfiledFunctions.count(BF))
@@ -375,12 +379,12 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
   for (const std::string &FunctionName : FunctionNames)
 HashString.append(FunctionName);
 }
-CallHashToBF.emplace(ComputeCallHash(HashString), BF);
+CallHashToBF[ComputeCallHash(HashString)] = BF;
   }
 
   std::unordered_map ProfiledFunctionIdToName;
 
-  for (const yaml::bolt::BinaryFunctionProfile YamlBF : YamlBP.Functions)
+  for (const yaml::bolt::BinaryFunctionProfile &YamlBF : YamlBP.Functions)
 ProfiledFunctionIdToName[YamlBF.Id] = YamlBF.Name;
 
   for (yaml::bolt::BinaryFunctionProfile &YamlBF : YamlBP.Functions) {
@@ -401,7 +405,7 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
 auto It = CallHashToBF.find(Hash);
 if (It == CallHashToBF.end())
   continue;
-matchProfileToFunction(YamlBF, It->second);
+matchProfileToFunction(YamlBF, *It->second);
 ++MatchedWithCallsAsAnchors;
   }
 }
@@ -480,7 +484,8 @@ Error YAMLProfileReader::readProfile(BinaryContext &BC) {
   matchProfileToFunction(YamlBF, *BF);
 
   uint64_t MatchedWithCallsAsAnchors = 0;
-  matchWithCallsAsAnchors(BC,  MatchedWithCallsAsAnchors);
+  if (opts::MatchWithCallsAsAnchors)
+matchWithCallsAsAnchors(BC,  MatchedWithCallsAsAnchors);
 
   for (yaml::bolt::BinaryFunctionProfile &YamlBF : YamlBP.Functions)
 if (!YamlBF.Used && opts::Verbosity >= 1)

>From ea7cb68ab9e8e158412c2e752986968968a60d93 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Tue, 25 Jun 2024 09:28:39 -0700
Subject: [PATCH 03/13] Changed BF called FunctionNames to multiset

Created using spr 1.3.4
---
 bolt/lib/Profile/YAMLProfileReader.cpp | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/bolt/lib/Profile/YAMLProfileReader.cpp 
b/bolt/lib/Profile/YAMLProfileReader.cpp
index 91b01a99c7485..3b3d73f7af023 100644
--- a/bolt/lib/Profile/YAMLProfileReader.cpp
+++ b/bolt/lib/Profile/YAMLProfileReader.cpp
@@ -365,7 +365,7 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
 
 std::string HashString;
 for (const auto &BB : BF->blocks()) {
-  std::set FunctionNames;
+  std::multiset FunctionNames;
   for (const MCInst &Instr : BB) {
 // Skip non-call instructions.
 if (!BC.MIB->isCall(Instr))
@@ -397,9 +397,8 @@ void YAMLProfileReader::matchWithCallsAsAnchors(
 std::string &FunctionName = ProfiledFunctionIdToName[CallSite.DestId];
 FunctionNames.insert(FunctionName);
 

[llvm-branch-commits] [BOLT][NFC] Refactor function matching (PR #97502)

2024-07-02 Thread Shaw Young via llvm-branch-commits

https://github.com/shawbyoung created 
https://github.com/llvm/llvm-project/pull/97502

Moved function matching techniques into separate helper functions.

Test Plan: n/a



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Match blocks with calls as anchors (PR #96596)

2024-07-02 Thread Shaw Young via llvm-branch-commits

shawbyoung wrote:

cc @WenleiHe @wlei-llvm

https://github.com/llvm/llvm-project/pull/96596
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Match functions with name similarity (PR #95884)

2024-07-02 Thread Amir Ayupov via llvm-branch-commits

https://github.com/aaupov edited https://github.com/llvm/llvm-project/pull/95884
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Match functions with name similarity (PR #95884)

2024-07-02 Thread Amir Ayupov via llvm-branch-commits


@@ -342,6 +350,107 @@ bool YAMLProfileReader::mayHaveProfileData(const 
BinaryFunction &BF) {
   return false;
 }
 
+uint64_t YAMLProfileReader::matchWithNameSimilarity(BinaryContext &BC) {
+  uint64_t MatchedWithNameSimilarity = 0;
+  ItaniumPartialDemangler ItaniumPartialDemangler;

aaupov wrote:

```suggestion
  ItaniumPartialDemangler Demangler;
```

https://github.com/llvm/llvm-project/pull/95884
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Match functions with name similarity (PR #95884)

2024-07-02 Thread Amir Ayupov via llvm-branch-commits


@@ -342,6 +350,107 @@ bool YAMLProfileReader::mayHaveProfileData(const 
BinaryFunction &BF) {
   return false;
 }
 
+uint64_t YAMLProfileReader::matchWithNameSimilarity(BinaryContext &BC) {
+  uint64_t MatchedWithNameSimilarity = 0;
+  ItaniumPartialDemangler ItaniumPartialDemangler;
+
+  // Demangle and derive namespace from function name.
+  auto DemangleName = [&](std::string &FunctionName) {
+StringRef RestoredName = NameResolver::restore(FunctionName);
+return demangle(RestoredName);
+  };
+  auto DeriveNameSpace = [&](std::string &DemangledName) {
+if (ItaniumPartialDemangler.partialDemangle(DemangledName.c_str()))
+  return std::string("");
+std::vector Buffer(DemangledName.begin(), DemangledName.end());
+size_t BufferSize = Buffer.size();
+char *NameSpace = ItaniumPartialDemangler.getFunctionDeclContextName(
+&Buffer[0], &BufferSize);

aaupov wrote:

What's bothering me here is the use of BufferSize – it's an output parameter 
from `getFunctionDecContextName` and yet unused in the construction of output 
string.

https://github.com/llvm/llvm-project/pull/95884
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Match functions with name similarity (PR #95884)

2024-07-02 Thread Amir Ayupov via llvm-branch-commits


@@ -342,6 +350,107 @@ bool YAMLProfileReader::mayHaveProfileData(const 
BinaryFunction &BF) {
   return false;
 }
 
+uint64_t YAMLProfileReader::matchWithNameSimilarity(BinaryContext &BC) {
+  uint64_t MatchedWithNameSimilarity = 0;
+  ItaniumPartialDemangler ItaniumPartialDemangler;
+
+  // Demangle and derive namespace from function name.
+  auto DemangleName = [&](std::string &FunctionName) {
+StringRef RestoredName = NameResolver::restore(FunctionName);
+return demangle(RestoredName);
+  };
+  auto DeriveNameSpace = [&](std::string &DemangledName) {
+if (ItaniumPartialDemangler.partialDemangle(DemangledName.c_str()))
+  return std::string("");
+std::vector Buffer(DemangledName.begin(), DemangledName.end());
+size_t BufferSize = Buffer.size();
+char *NameSpace = ItaniumPartialDemangler.getFunctionDeclContextName(
+&Buffer[0], &BufferSize);
+return NameSpace ? std::string(NameSpace) : std::string("");
+  };
+
+  // Maps namespaces to associated function block counts and gets profile
+  // function names and namespaces to minimize the number of BFs to process and
+  // avoid repeated name demangling/namespace derivision.
+  StringMap> NamespaceToProfiledBFSizes;
+  std::vector ProfileBFDemangledNames;
+  ProfileBFDemangledNames.reserve(YamlBP.Functions.size());
+  std::vector ProfiledBFNamespaces;
+  ProfiledBFNamespaces.reserve(YamlBP.Functions.size());
+
+  for (auto &YamlBF : YamlBP.Functions) {
+std::string YamlBFDemangledName = DemangleName(YamlBF.Name);
+ProfileBFDemangledNames.push_back(YamlBFDemangledName);
+std::string YamlBFNamespace = DeriveNameSpace(YamlBFDemangledName);
+ProfiledBFNamespaces.push_back(YamlBFNamespace);
+NamespaceToProfiledBFSizes[YamlBFNamespace].insert(YamlBF.NumBasicBlocks);
+  }
+
+  StringMap> NamespaceToBFs;
+
+  // Maps namespaces to BFs disincluding binary functions with no equal sized

aaupov wrote:

```suggestion
  // Maps namespaces to BFs excluding binary functions with no equal sized
```

https://github.com/llvm/llvm-project/pull/95884
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Match functions with name similarity (PR #95884)

2024-07-02 Thread Amir Ayupov via llvm-branch-commits


@@ -342,6 +350,107 @@ bool YAMLProfileReader::mayHaveProfileData(const 
BinaryFunction &BF) {
   return false;
 }
 
+uint64_t YAMLProfileReader::matchWithNameSimilarity(BinaryContext &BC) {
+  uint64_t MatchedWithNameSimilarity = 0;
+  ItaniumPartialDemangler ItaniumPartialDemangler;
+
+  // Demangle and derive namespace from function name.
+  auto DemangleName = [&](std::string &FunctionName) {
+StringRef RestoredName = NameResolver::restore(FunctionName);
+return demangle(RestoredName);
+  };
+  auto DeriveNameSpace = [&](std::string &DemangledName) {
+if (ItaniumPartialDemangler.partialDemangle(DemangledName.c_str()))
+  return std::string("");
+std::vector Buffer(DemangledName.begin(), DemangledName.end());
+size_t BufferSize = Buffer.size();
+char *NameSpace = ItaniumPartialDemangler.getFunctionDeclContextName(
+&Buffer[0], &BufferSize);
+return NameSpace ? std::string(NameSpace) : std::string("");
+  };
+
+  // Maps namespaces to associated function block counts and gets profile
+  // function names and namespaces to minimize the number of BFs to process and
+  // avoid repeated name demangling/namespace derivision.

aaupov wrote:

```suggestion
  // avoid repeated name demangling/namespace derivation.
```

https://github.com/llvm/llvm-project/pull/95884
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Match functions with name similarity (PR #95884)

2024-07-02 Thread Amir Ayupov via llvm-branch-commits

https://github.com/aaupov commented:

LG in general with some comments

https://github.com/llvm/llvm-project/pull/95884
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Match functions with name similarity (PR #95884)

2024-07-02 Thread Amir Ayupov via llvm-branch-commits


@@ -342,6 +350,107 @@ bool YAMLProfileReader::mayHaveProfileData(const 
BinaryFunction &BF) {
   return false;
 }
 
+uint64_t YAMLProfileReader::matchWithNameSimilarity(BinaryContext &BC) {
+  uint64_t MatchedWithNameSimilarity = 0;
+  ItaniumPartialDemangler ItaniumPartialDemangler;
+
+  // Demangle and derive namespace from function name.
+  auto DemangleName = [&](std::string &FunctionName) {
+StringRef RestoredName = NameResolver::restore(FunctionName);
+return demangle(RestoredName);
+  };
+  auto DeriveNameSpace = [&](std::string &DemangledName) {
+if (ItaniumPartialDemangler.partialDemangle(DemangledName.c_str()))
+  return std::string("");
+std::vector Buffer(DemangledName.begin(), DemangledName.end());
+size_t BufferSize = Buffer.size();
+char *NameSpace = ItaniumPartialDemangler.getFunctionDeclContextName(
+&Buffer[0], &BufferSize);
+return NameSpace ? std::string(NameSpace) : std::string("");

aaupov wrote:

```suggestion
return std::string(NameSpace, BufferSize);
```

https://github.com/llvm/llvm-project/pull/95884
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Match functions with name similarity (PR #95884)

2024-07-02 Thread Amir Ayupov via llvm-branch-commits


@@ -23,6 +26,11 @@ extern cl::opt Verbosity;
 extern cl::OptionCategory BoltOptCategory;
 extern cl::opt InferStaleProfile;
 
+cl::opt NameSimilarityFunctionMatchingThreshold(
+"name-similarity-function-matching-threshold",
+cl::desc("Match functions using namespace and edit distance."), 
cl::init(0),

aaupov wrote:

```suggestion
cl::desc("Match functions using namespace and edit distance"), cl::init(0),
```

https://github.com/llvm/llvm-project/pull/95884
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm-objdump] -r: support CREL (PR #97382)

2024-07-02 Thread Fangrui Song via llvm-branch-commits


@@ -2687,6 +2687,15 @@ void Dumper::printRelocations() {
<< "VALUE\n";
 
 for (SectionRef Section : P.second) {
+  // CREL requires decoding and has its specific errors.
+  if (O.isELF() && ELFSectionRef(Section).getType() == ELF::SHT_CREL) {
+const ELFObjectFileBase *ELF = cast(&O);
+StringRef Err = ELF->getCrelError(Section);
+if (!Err.empty()) {
+  reportUniqueWarning(Err);

MaskRay wrote:

Adopted `getCrelDecodeProblem`.

> My expectation for a decode error would be that we report it, but continue as 
> best we can. I would expect a non-zero error code though. Please ignore if 
> this isn't llvm-objdump practice.

llvm-readobj tries to continue execution and report warnings. While a lot of 
code in llvm-objdump reports errors, there have been efforts to migrate to 
warnings: https://reviews.llvm.org/D154754

https://github.com/llvm/llvm-project/pull/97382
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Match blocks with calls as anchors (PR #96596)

2024-07-02 Thread Amir Ayupov via llvm-branch-commits


@@ -70,12 +70,16 @@ class YAMLProfileReader : public ProfileReaderBase {
   std::vector ProfileBFs;
 
   /// Populate \p Function profile with the one supplied in YAML format.
-  bool parseFunctionProfile(BinaryFunction &Function,
-const yaml::bolt::BinaryFunctionProfile &YamlBF);
+  bool parseFunctionProfile(
+  const DenseMap &IdToFunctionName,

aaupov wrote:

Define the map as YAMLProfileReader member, with a comment about its use.

https://github.com/llvm/llvm-project/pull/96596
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Match blocks with calls as anchors (PR #96596)

2024-07-02 Thread Amir Ayupov via llvm-branch-commits


@@ -479,6 +481,11 @@ Error YAMLProfileReader::readProfile(BinaryContext &BC) {
   NormalizeByInsnCount = usesEvent("cycles") || usesEvent("instructions");
   NormalizeByCalls = usesEvent("branches");
 
+  // Map profiled function ids to names.
+  DenseMap IdToFunctionName;

aaupov wrote:

Do we need a map and not a vector?

https://github.com/llvm/llvm-project/pull/96596
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm-objdump] -r: support CREL (PR #97382)

2024-07-02 Thread Fangrui Song via llvm-branch-commits

https://github.com/MaskRay updated 
https://github.com/llvm/llvm-project/pull/97382


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm-objdump] -r: support CREL (PR #97382)

2024-07-02 Thread Fangrui Song via llvm-branch-commits

https://github.com/MaskRay updated 
https://github.com/llvm/llvm-project/pull/97382


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm-objdump] -r: support CREL (PR #97382)

2024-07-02 Thread Fangrui Song via llvm-branch-commits

https://github.com/MaskRay edited 
https://github.com/llvm/llvm-project/pull/97382
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm-objdump] -r: support CREL (PR #97382)

2024-07-02 Thread Fangrui Song via llvm-branch-commits


@@ -1453,6 +1525,15 @@ template  bool 
ELFObjectFile::isRelocatableObject() const {
   return EF.getHeader().e_type == ELF::ET_REL;
 }
 
+template 
+StringRef ELFObjectFile::getCrelError(DataRefImpl Sec) const {
+  uintptr_t SHT = reinterpret_cast(cantFail(EF.sections()).begin());
+  auto I = (Sec.p - SHT) / EF.getHeader().e_shentsize;
+  if (I < CrelErrs.size())
+return CrelErrs[I];
+  return "";

MaskRay wrote:

I have changed `mutable SmallVector CrelErrs;` so that it is 
only non-empty when there is a decoder problem. This is now needed.

https://github.com/llvm/llvm-project/pull/97382
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm-objdump] -r: support CREL (PR #97382)

2024-07-02 Thread Fangrui Song via llvm-branch-commits


@@ -0,0 +1,213 @@
+# RUN: yaml2obj --docnum=1 %s -o %t
+# RUN: llvm-objdump -r %t | FileCheck %s --strict-whitespace --match-full-lines
+
+#  CHECK:RELOCATION RECORDS FOR [.text]:
+# CHECK-NEXT:OFFSET   TYPE VALUE
+# CHECK-NEXT:0001 R_X86_64_32  g1+0x1
+# CHECK-NEXT:0002 R_X86_64_64  l1+0x2
+# CHECK-NEXT: R_X86_64_32S g1-0x1
+# CHECK-NEXT:0004 R_X86_64_32S .text-0x8000
+#CHECK-EMPTY:
+# CHECK-NEXT:RELOCATION RECORDS FOR [nonalloc]:
+# CHECK-NEXT:OFFSET   TYPE VALUE
+# CHECK-NEXT:0010 R_X86_64_64  g1+0x1
+# CHECK-NEXT:0020 R_X86_64_64  g2+0x2
+# CHECK-NEXT:0030 R_X86_64_64  *ABS*
+#  CHECK-NOT:{{.}}
+
+--- !ELF
+FileHeader: !FileHeader
+  Class: ELFCLASS64
+  Data: ELFDATA2LSB
+  Type: ET_REL
+  Machine: EM_X86_64
+
+Sections:
+- Name: .text
+  Type: SHT_PROGBITS
+  Content: ""
+  Flags: [SHF_ALLOC]
+- Name: .crel.text
+  Type: SHT_CREL
+  Info: .text
+  Link: .symtab
+  Relocations:
+- Offset: 0x1
+  Symbol: g1
+  Type:   R_X86_64_32
+  Addend: 1
+- Offset: 0x2
+  Symbol: l1
+  Type:   R_X86_64_64
+  Addend: 2
+- Offset: 0x0
+  Symbol: g1
+  Type:   R_X86_64_32S
+  Addend: 0x
+- Offset: 0x4
+  Symbol: .text
+  Type:   R_X86_64_32S
+  Addend: 0x8000
+- Name: nonalloc
+  Type: SHT_PROGBITS
+  Size: 0x30
+- Name: .crelnonalloc
+  Type: SHT_CREL
+  Info: nonalloc
+  Link: .symtab
+  Relocations:
+- Offset: 0x10
+  Symbol: g1
+  Type:   R_X86_64_64
+  Addend: 1
+- Offset: 0x20
+  Symbol: g2
+  Type:   R_X86_64_64
+  Addend: 2
+- Offset: 0x30
+  Symbol: 0
+  Type:   R_X86_64_64
+
+Symbols:
+  - Name: .text
+Type: STT_SECTION
+Section: .text
+  - Name:l1
+  - Name:g1
+Section: .text
+Value:   0x0
+Size:4
+Binding: STB_GLOBAL
+  - Name:g2
+Binding: STB_GLOBAL
+
+## Check relocation formatting on ELFCLASS32 as well.
+# RUN: yaml2obj --docnum=2 %s > %t2
+# RUN: llvm-objdump -r %t2 | FileCheck %s --check-prefix=ELF32 
--strict-whitespace --match-full-lines
+
+#  ELF32:RELOCATION RECORDS FOR [.text]:
+# ELF32-NEXT:OFFSET   TYPE VALUE
+# ELF32-NEXT:0008 R_ARM_REL32  l1+0x1
+# ELF32-NEXT:0004 R_ARM_ABS32  g1
+
+--- !ELF
+FileHeader:
+  Class:   ELFCLASS32
+  Data:ELFDATA2MSB
+  Type:ET_REL
+  Machine: EM_ARM
+Sections:
+- Name: .text
+  Type: SHT_PROGBITS
+  Size: 0x10
+- Name: .crel.text
+  Type: SHT_CREL
+  Info: .text
+  Link: .symtab
+  Relocations:
+- Offset: 0x8
+  Symbol: l1
+  Type:   R_ARM_REL32
+  Addend: 1
+- Offset: 0x4
+  Symbol: g1
+  Type:   R_ARM_ABS32
+Symbols:
+  - Name:l1
+  - Name:g1
+Binding: STB_GLOBAL
+
+## Check CREL with implicit addends.
+# RUN: yaml2obj --docnum=3 %s -o %t3
+# RUN: llvm-objdump -r %t3 | FileCheck %s --check-prefix=IMPLICIT 
--strict-whitespace --match-full-lines
+#  IMPLICIT:RELOCATION RECORDS FOR [.data]:
+# IMPLICIT-NEXT:OFFSET   TYPE VALUE
+# IMPLICIT-NEXT:001f R_X86_64_32  g1
+# IMPLICIT-NEXT:003f R_X86_64_64  g1
+# IMPLICIT-NEXT: R_X86_64_32S l1
+--- !ELF
+FileHeader:
+  Class: ELFCLASS64
+  Data:  ELFDATA2LSB
+  Type:  ET_REL
+  Machine:   EM_X86_64
+Sections:
+  - Name:.text
+Type:SHT_PROGBITS
+  - Name:.data
+Type:SHT_PROGBITS
+  - Name:.crel.data
+Type:SHT_CREL
+Flags:   [ SHF_INFO_LINK ]
+Link:.symtab
+Info:.data
+Content: 187f030a82017787feff077f0a
+Symbols:
+  - Name:.text
+Type:STT_SECTION
+Section: .text
+  - Name:l1
+Section: .text
+  - Name:g1
+Section: .text
+Binding: STB_GLOBAL
+
+## Test errors.
+# RUN: yaml2obj --docnum=4 %s -o %t.err
+# RUN: llvm-objdump -r %t.err 2>&1 | FileCheck %s --check-prefix=ERR 
-DFILE=%t.err
+
+#  ERR:RELOCATION RECORDS FOR [.data]:
+# ERR-NEXT:OFFSET   TYPE VALUE
+# ERR-NEXT:warning: '[[FILE]]': unable to decode LEB128 at offset 0x: 
malformed uleb128, extends past end
+#  ERR-NOT:{{.}}
+
+--- !ELF
+FileHeader:
+  Class: ELFCLASS64
+  Data:  ELFDATA2LSB
+  Type:  ET_REL
+  Machine:   EM_X86_64
+Sections:
+  - Name:.text
+Type:SHT_PROGBITS
+  - Name:.data
+Type:SHT_PROGBITS
+  - Name:.crel.data
+Type:SHT_CREL
+Flags:   []
+Link:.symtab
+Info:.data
+Symbols:
+  - Name:.text
+Type:STT_SECTION
+Section: .text
+
+# RUN: yaml2obj --docnum=5 %s -o %t.err2
+# RUN: llvm-objdump -r %t.err2 2>&1 | FileCheck %s --check-prefix=ERR2 
-DFILE=%t.err

[llvm-branch-commits] [llvm-objcopy] Support CREL (PR #97521)

2024-07-02 Thread Fangrui Song via llvm-branch-commits

https://github.com/MaskRay created 
https://github.com/llvm/llvm-project/pull/97521

llvm-objcopy may modify the symbol table and need to rewrite
relocations. For CREL, while we can reuse the decoder from #91280, we
need an encoder to support CREL.

Since MC/ELFObjectWriter.cpp has an existing encoder, and MC is at a
lower layer than Object, extract the encoder to a new header file
llvm/MC/MCELFExtras.h.

Link: 
https://discourse.llvm.org/t/rfc-crel-a-compact-relocation-format-for-elf/77600



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm-objcopy] Support CREL (PR #97521)

2024-07-02 Thread via llvm-branch-commits

llvmbot wrote:



@llvm/pr-subscribers-llvm-binary-utilities

@llvm/pr-subscribers-mc

Author: Fangrui Song (MaskRay)


Changes

llvm-objcopy may modify the symbol table and need to rewrite
relocations. For CREL, while we can reuse the decoder from #91280, we
need an encoder to support CREL.

Since MC/ELFObjectWriter.cpp has an existing encoder, and MC is at a
lower layer than Object, extract the encoder to a new header file
llvm/MC/MCELFExtras.h.

Link: 
https://discourse.llvm.org/t/rfc-crel-a-compact-relocation-format-for-elf/77600


---
Full diff: https://github.com/llvm/llvm-project/pull/97521.diff


7 Files Affected:

- (modified) llvm/include/llvm/BinaryFormat/ELF.h (+9) 
- (added) llvm/include/llvm/MC/MCELFExtras.h (+60) 
- (modified) llvm/lib/MC/ELFObjectWriter.cpp (+10-39) 
- (modified) llvm/lib/ObjCopy/ELF/ELFObject.cpp (+40-7) 
- (modified) llvm/lib/ObjCopy/ELF/ELFObject.h (+3-2) 
- (added) llvm/test/tools/llvm-objcopy/ELF/crel.test (+140) 
- (modified) llvm/test/tools/llvm-objcopy/ELF/strip-reloc-symbol.test (+3-1) 


``diff
diff --git a/llvm/include/llvm/BinaryFormat/ELF.h 
b/llvm/include/llvm/BinaryFormat/ELF.h
index 32cc9ff8cbb78..456c6b4a7 100644
--- a/llvm/include/llvm/BinaryFormat/ELF.h
+++ b/llvm/include/llvm/BinaryFormat/ELF.h
@@ -22,6 +22,7 @@
 #include "llvm/ADT/StringRef.h"
 #include 
 #include 
+#include 
 
 namespace llvm {
 namespace ELF {
@@ -1430,6 +1431,14 @@ struct Elf64_Rela {
   }
 };
 
+// In-memory representation of CREL. The serialized representation uses LEB128.
+template  struct Elf_Crel {
+  std::conditional_t r_offset;
+  uint32_t r_symidx;
+  uint32_t r_type;
+  std::conditional_t r_addend;
+};
+
 // Relocation entry without explicit addend or info (relative relocations 
only).
 typedef Elf64_Xword Elf64_Relr; // offset/bitmap for relative relocations
 
diff --git a/llvm/include/llvm/MC/MCELFExtras.h 
b/llvm/include/llvm/MC/MCELFExtras.h
new file mode 100644
index 0..81c5ddf02dbc5
--- /dev/null
+++ b/llvm/include/llvm/MC/MCELFExtras.h
@@ -0,0 +1,60 @@
+//===- MCELFExtras.h - Extra functions for ELF --*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLVM_MC_MCELFEXTRAS_H
+#define LLVM_MC_MCELFEXTRAS_H
+
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/BinaryFormat/ELF.h"
+#include "llvm/Support/LEB128.h"
+#include "llvm/Support/raw_ostream.h"
+
+#include 
+#include 
+
+namespace llvm::ELF {
+template 
+void encodeCrel(raw_ostream &OS, RelocsTy Relocs, F ToCrel) {
+  using uint = std::conditional_t;
+  uint OffsetMask = 8, Offset = 0, Addend = 0;
+  uint32_t SymIdx = 0, Type = 0;
+  for (const auto &R : Relocs)
+OffsetMask |= ToCrel(R).r_offset;
+  const int Shift = llvm::countr_zero(OffsetMask);
+  encodeULEB128(Relocs.size() * 8 + ELF::CREL_HDR_ADDEND + Shift, OS);
+  for (const auto &R : Relocs) {
+auto CR = ToCrel(R);
+auto DeltaOffset = static_cast((CR.r_offset - Offset) >> Shift);
+Offset = CR.r_offset;
+uint8_t B = (DeltaOffset << 3) + (SymIdx != CR.r_symidx) +
+(Type != CR.r_type ? 2 : 0) +
+(Addend != uint(CR.r_addend) ? 4 : 0);
+if (DeltaOffset < 0x10) {
+  OS << char(B);
+} else {
+  OS << char(B | 0x80);
+  encodeULEB128(DeltaOffset >> 4, OS);
+}
+// Delta symidx/type/addend members (SLEB128).
+if (B & 1) {
+  encodeSLEB128(static_cast(CR.r_symidx - SymIdx), OS);
+  SymIdx = CR.r_symidx;
+}
+if (B & 2) {
+  encodeSLEB128(static_cast(CR.r_type - Type), OS);
+  Type = CR.r_type;
+}
+if (B & 4) {
+  encodeSLEB128(std::make_signed_t(CR.r_addend - Addend), OS);
+  Addend = CR.r_addend;
+}
+  }
+}
+} // namespace llvm::ELF
+
+#endif
diff --git a/llvm/lib/MC/ELFObjectWriter.cpp b/llvm/lib/MC/ELFObjectWriter.cpp
index bcc6dfeeeccd6..771afff09ec5d 100644
--- a/llvm/lib/MC/ELFObjectWriter.cpp
+++ b/llvm/lib/MC/ELFObjectWriter.cpp
@@ -23,6 +23,7 @@
 #include "llvm/MC/MCAsmInfo.h"
 #include "llvm/MC/MCAssembler.h"
 #include "llvm/MC/MCContext.h"
+#include "llvm/MC/MCELFExtras.h"
 #include "llvm/MC/MCELFObjectWriter.h"
 #include "llvm/MC/MCExpr.h"
 #include "llvm/MC/MCFixup.h"
@@ -919,44 +920,14 @@ void ELFWriter::WriteSecHdrEntry(uint32_t Name, uint32_t 
Type, uint64_t Flags,
   WriteWord(EntrySize); // sh_entsize
 }
 
-template 
+template 
 static void encodeCrel(ArrayRef Relocs, raw_ostream &OS) {
-  uint OffsetMask = 8, Offset = 0, Addend = 0;
-  uint32_t SymIdx = 0, Type = 0;
-  // hdr & 4 indicates 3 flag bits in delta offset and flags members.
-  for (const ELFRelocationEntry &Entry : Relocs)
-OffsetMask |= Entry.Offset;
-  const int Shift = llvm::countr_zero(OffsetMask);
-  encodeULEB128(Relocs.size() * 8 + ELF::CREL_HDR_AD