date:20241018

[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)

2024-10-18 Thread Kyungwoo Lee via llvm-branch-commits


https://github.com/kyulee-com ready_for_review 
https://github.com/llvm/llvm-project/pull/112638
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)

2024-10-18 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-llvm-ir

Author: Kyungwoo Lee (kyulee-com)


Changes

This computes a structural hash while allowing for selective ignoring of 
certain operands based on a custom function that is provided. Instead of a 
single hash value, it now returns FunctionHashInfo which includes a hash value, 
an instruction mapping, and a map to track the operand location and its 
corresponding hash value that is ignored.

Depends on https://github.com/llvm/llvm-project/pull/112621.
This is a patch for 
https://discourse.llvm.org/t/rfc-global-function-merging/82608.

---
Full diff: https://github.com/llvm/llvm-project/pull/112638.diff


3 Files Affected:

- (modified) llvm/include/llvm/IR/StructuralHash.h (+46) 
- (modified) llvm/lib/IR/StructuralHash.cpp (+174-14) 
- (modified) llvm/unittests/IR/StructuralHashTest.cpp (+55) 


``diff
diff --git a/llvm/include/llvm/IR/StructuralHash.h 
b/llvm/include/llvm/IR/StructuralHash.h
index aa292bc3446799..bc82c204c4d1f6 100644
--- a/llvm/include/llvm/IR/StructuralHash.h
+++ b/llvm/include/llvm/IR/StructuralHash.h
@@ -14,7 +14,9 @@
 #ifndef LLVM_IR_STRUCTURALHASH_H
 #define LLVM_IR_STRUCTURALHASH_H
 
+#include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/StableHashing.h"
+#include "llvm/IR/Instruction.h"
 #include 
 
 namespace llvm {
@@ -23,6 +25,7 @@ class Function;
 class Module;
 
 using IRHash = stable_hash;
+using OpndHash = stable_hash;
 
 /// Returns a hash of the function \p F.
 /// \param F The function to hash.
@@ -37,6 +40,49 @@ IRHash StructuralHash(const Function &F, bool DetailedHash = 
false);
 /// composed the module hash.
 IRHash StructuralHash(const Module &M, bool DetailedHash = false);
 
+/// The pair of an instruction index and a operand index.
+using IndexPair = std::pair;
+
+/// A map from an instruction index to an instruction pointer.
+using IndexInstrMap = MapVector;
+
+/// A map from an IndexPair to an OpndHash.
+using IndexOperandHashMapType = DenseMap;
+
+/// A function that takes an instruction and an operand index and returns true
+/// if the operand should be ignored in the function hash computation.
+using IgnoreOperandFunc = std::function;
+
+struct FunctionHashInfo {
+  /// A hash value representing the structural content of the function
+  IRHash FunctionHash;
+  /// A mapping from instruction indices to instruction pointers
+  std::unique_ptr IndexInstruction;
+  /// A mapping from pairs of instruction indices and operand indices
+  /// to the hashes of the operands. This can be used to analyze or
+  /// reconstruct the differences in ignored operands
+  std::unique_ptr IndexOperandHashMap;
+
+  FunctionHashInfo(IRHash FuntionHash,
+   std::unique_ptr IndexInstruction,
+   std::unique_ptr 
IndexOperandHashMap)
+  : FunctionHash(FuntionHash),
+IndexInstruction(std::move(IndexInstruction)),
+IndexOperandHashMap(std::move(IndexOperandHashMap)) {}
+};
+
+/// Computes a structural hash of a given function, considering the structure
+/// and content of the function's instructions while allowing for selective
+/// ignoring of certain operands based on custom criteria. This hash can be 
used
+/// to identify functions that are structurally similar or identical, which is
+/// useful in optimizations, deduplication, or analysis tasks.
+/// \param F The function to hash.
+/// \param IgnoreOp A callable that takes an instruction and an operand index,
+/// and returns true if the operand should be ignored in the hash computation.
+/// \return A FunctionHashInfo structure
+FunctionHashInfo StructuralHashWithDifferences(const Function &F,
+   IgnoreOperandFunc IgnoreOp);
+
 } // end namespace llvm
 
 #endif
diff --git a/llvm/lib/IR/StructuralHash.cpp b/llvm/lib/IR/StructuralHash.cpp
index a1fabab77d52b2..6e0af666010a05 100644
--- a/llvm/lib/IR/StructuralHash.cpp
+++ b/llvm/lib/IR/StructuralHash.cpp
@@ -28,6 +28,19 @@ class StructuralHashImpl {
 
   bool DetailedHash;
 
+  /// IgnoreOp is a function that returns true if the operand should be 
ignored.
+  IgnoreOperandFunc IgnoreOp = nullptr;
+  /// A mapping from instruction indices to instruction pointers.
+  /// The index represents the position of an instruction based on the order in
+  /// which it is first encountered.
+  std::unique_ptr IndexInstruction = nullptr;
+  /// A mapping from pairs of instruction indices and operand indices
+  /// to the hashes of the operands.
+  std::unique_ptr IndexOperandHashMap = nullptr;
+
+  /// Assign a unique ID to each Value in the order they are first seen.
+  DenseMap ValueToId;
+
   // This will produce different values on 32-bit and 64-bit systens as
   // hash_combine returns a size_t. However, this is only used for
   // detailed hashing which, in-tree, only needs to distinguish between
@@ -47,24 +60,140 @@ class StructuralHashImpl {
 
 public:
   StructuralHashImpl() = delete;
-  explicit StructuralHashImpl(bool DetailedHash) : De

[llvm-branch-commits] [lld] [PAC][lld] Do not emit warnings for `-z pac-plt` with valid PAuth core info (PR #112959)

2024-10-18 Thread Daniil Kovalev via llvm-branch-commits


kovdan01 wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/112959?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#112959** https://app.graphite.dev/github/pr/llvm/llvm-project/112959?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈
* **#112958** https://app.graphite.dev/github/pr/llvm/llvm-project/112958?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about 
stacking.


 Join @kovdan01 and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/112959
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [lld] [PAC][lld] Do not emit warnings for `-z pac-plt` with valid PAuth core info (PR #112959)

2024-10-18 Thread Daniil Kovalev via llvm-branch-commits


https://github.com/kovdan01 created 
https://github.com/llvm/llvm-project/pull/112959

When PAuth core info is present and (platform,version) is not (0,0),
treat input files as pac-enabled and do not emit a warning with
`-z pac-plt` passed.

>From 5420db8f3959f073f379466c340252a1816e1810 Mon Sep 17 00:00:00 2001
From: Daniil Kovalev 
Date: Fri, 18 Oct 2024 22:07:51 +0300
Subject: [PATCH] [PAC][lld] Do not emit warnings for `-z pac-plt` with valid
 PAuth core info

When PAuth core info is present and (platform,version) is not (0,0),
treat input files as pac-enabled and do not emit a warning with
`-z pac-plt` passed.
---
 lld/ELF/Driver.cpp   | 10 -
 lld/test/ELF/aarch64-feature-pac.s   |  2 +-
 lld/test/ELF/aarch64-feature-pauth.s | 58 ++--
 3 files changed, 64 insertions(+), 6 deletions(-)

diff --git a/lld/ELF/Driver.cpp b/lld/ELF/Driver.cpp
index fb77e67e9fc5ca..c436be6b24e001 100644
--- a/lld/ELF/Driver.cpp
+++ b/lld/ELF/Driver.cpp
@@ -2753,6 +2753,10 @@ static void readSecurityNotes(Ctx &ctx) {
   referenceFileName = (*it)->getName();
 }
   }
+  bool hasValidPauthAbiCoreInfo =
+  (!ctx.aarch64PauthAbiCoreInfo.empty() &&
+   llvm::any_of(ctx.aarch64PauthAbiCoreInfo,
+[](uint8_t c) { return c != 0; }));
 
   for (ELFFileBase *f : ctx.objectFiles) {
 uint32_t features = f->andFeatures;
@@ -2789,9 +2793,11 @@ static void readSecurityNotes(Ctx &ctx) {
"GNU_PROPERTY_X86_FEATURE_1_IBT property");
   features |= GNU_PROPERTY_X86_FEATURE_1_IBT;
 }
-if (ctx.arg.zPacPlt && !(features & GNU_PROPERTY_AARCH64_FEATURE_1_PAC)) {
+if (ctx.arg.zPacPlt && !(hasValidPauthAbiCoreInfo ||
+ (features & GNU_PROPERTY_AARCH64_FEATURE_1_PAC))) 
{
   warn(toString(f) + ": -z pac-plt: file does not have "
- "GNU_PROPERTY_AARCH64_FEATURE_1_PAC property");
+ "GNU_PROPERTY_AARCH64_FEATURE_1_PAC property and no "
+ "valid PAuth core info present for this link job");
   features |= GNU_PROPERTY_AARCH64_FEATURE_1_PAC;
 }
 ctx.arg.andFeatures &= features;
diff --git a/lld/test/ELF/aarch64-feature-pac.s 
b/lld/test/ELF/aarch64-feature-pac.s
index b85a33216cb5bd..4fd1fd2acea737 100644
--- a/lld/test/ELF/aarch64-feature-pac.s
+++ b/lld/test/ELF/aarch64-feature-pac.s
@@ -82,7 +82,7 @@
 
 # RUN: ld.lld %t.o %t2.o -z pac-plt %t.so -o %tpacplt.exe 2>&1 | FileCheck 
-DFILE=%t2.o --check-prefix WARN %s
 
-# WARN: warning: [[FILE]]: -z pac-plt: file does not have 
GNU_PROPERTY_AARCH64_FEATURE_1_PAC property
+# WARN: warning: [[FILE]]: -z pac-plt: file does not have 
GNU_PROPERTY_AARCH64_FEATURE_1_PAC property and no valid PAuth core info 
present for this link job
 
 # RUN: llvm-readelf -n %tpacplt.exe | FileCheck --check-prefix=PACPROP %s
 # RUN: llvm-readelf --dynamic-table %tpacplt.exe | FileCheck --check-prefix 
PACDYN2 %s
diff --git a/lld/test/ELF/aarch64-feature-pauth.s 
b/lld/test/ELF/aarch64-feature-pauth.s
index 699a650d72295a..c11073dba86f24 100644
--- a/lld/test/ELF/aarch64-feature-pauth.s
+++ b/lld/test/ELF/aarch64-feature-pauth.s
@@ -33,13 +33,53 @@
 # RUN: llvm-mc -filetype=obj -triple=aarch64-linux-gnu no-info.s -o noinfo1.o
 # RUN: cp noinfo1.o noinfo2.o
 # RUN: not ld.lld -z pauth-report=error noinfo1.o tag1.o noinfo2.o -o 
/dev/null 2>&1 | FileCheck --check-prefix ERR5 %s
-# RUN: ld.lld -z pauth-report=warning noinfo1.o tag1.o noinfo2.o -o /dev/null 
2>&1 | FileCheck --check-prefix WARN %s
+# RUN: ld.lld -z pauth-report=warning noinfo1.o tag1.o noinfo2.o -o /dev/null 
2>&1 | FileCheck --check-prefix WARN1 %s
 # RUN: ld.lld -z pauth-report=none noinfo1.o tag1.o noinfo2.o --fatal-warnings 
-o /dev/null
 
 # ERR5:  error: noinfo1.o: -z pauth-report: file does not have AArch64 
PAuth core info while 'tag1.o' has one
 # ERR5-NEXT: error: noinfo2.o: -z pauth-report: file does not have AArch64 
PAuth core info while 'tag1.o' has one
-# WARN:  warning: noinfo1.o: -z pauth-report: file does not have AArch64 
PAuth core info while 'tag1.o' has one
-# WARN-NEXT: warning: noinfo2.o: -z pauth-report: file does not have AArch64 
PAuth core info while 'tag1.o' has one
+# WARN1:  warning: noinfo1.o: -z pauth-report: file does not have AArch64 
PAuth core info while 'tag1.o' has one
+# WARN1-NEXT: warning: noinfo2.o: -z pauth-report: file does not have AArch64 
PAuth core info while 'tag1.o' has one
+
+# RUN: llvm-mc -filetype=obj -triple=aarch64-linux-gnu abi-tag-zero.s  
  -o tag-zero.o
+# RUN: llvm-mc -filetype=obj -triple=aarch64-linux-gnu 
%p/Inputs/aarch64-func2.s -o func2.o
+# RUN: llvm-mc -filetype=obj -triple=aarch64-linux-gnu 
%p/Inputs/aarch64-func3.s -o func3.o
+# RUN: ld.lld func3.o --shared -o func3.so
+# RUN: ld.lld tag1.o func2.o func3.so -z pac-plt --shared -o pacplt-nowarn 
--fatal-warnings
+# RUN: ld.lld tag-zero.o func2.o func3.so -z pac-plt --shared -o pacplt-wa

[llvm-branch-commits] [lld] [PAC][lld] Do not emit warnings for `-z pac-plt` with valid PAuth core info (PR #112959)

2024-10-18 Thread Daniil Kovalev via llvm-branch-commits


https://github.com/kovdan01 ready_for_review 
https://github.com/llvm/llvm-project/pull/112959
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [lld] [PAC][lld] Do not emit warnings for `-z pac-plt` with valid PAuth core info (PR #112959)

2024-10-18 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-lld

Author: Daniil Kovalev (kovdan01)


Changes

When PAuth core info is present and (platform,version) is not (0,0),
treat input files as pac-enabled and do not emit a warning with
`-z pac-plt` passed.

---
Full diff: https://github.com/llvm/llvm-project/pull/112959.diff


3 Files Affected:

- (modified) lld/ELF/Driver.cpp (+8-2) 
- (modified) lld/test/ELF/aarch64-feature-pac.s (+1-1) 
- (modified) lld/test/ELF/aarch64-feature-pauth.s (+55-3) 


``diff
diff --git a/lld/ELF/Driver.cpp b/lld/ELF/Driver.cpp
index fb77e67e9fc5ca..c436be6b24e001 100644
--- a/lld/ELF/Driver.cpp
+++ b/lld/ELF/Driver.cpp
@@ -2753,6 +2753,10 @@ static void readSecurityNotes(Ctx &ctx) {
   referenceFileName = (*it)->getName();
 }
   }
+  bool hasValidPauthAbiCoreInfo =
+  (!ctx.aarch64PauthAbiCoreInfo.empty() &&
+   llvm::any_of(ctx.aarch64PauthAbiCoreInfo,
+[](uint8_t c) { return c != 0; }));
 
   for (ELFFileBase *f : ctx.objectFiles) {
 uint32_t features = f->andFeatures;
@@ -2789,9 +2793,11 @@ static void readSecurityNotes(Ctx &ctx) {
"GNU_PROPERTY_X86_FEATURE_1_IBT property");
   features |= GNU_PROPERTY_X86_FEATURE_1_IBT;
 }
-if (ctx.arg.zPacPlt && !(features & GNU_PROPERTY_AARCH64_FEATURE_1_PAC)) {
+if (ctx.arg.zPacPlt && !(hasValidPauthAbiCoreInfo ||
+ (features & GNU_PROPERTY_AARCH64_FEATURE_1_PAC))) 
{
   warn(toString(f) + ": -z pac-plt: file does not have "
- "GNU_PROPERTY_AARCH64_FEATURE_1_PAC property");
+ "GNU_PROPERTY_AARCH64_FEATURE_1_PAC property and no "
+ "valid PAuth core info present for this link job");
   features |= GNU_PROPERTY_AARCH64_FEATURE_1_PAC;
 }
 ctx.arg.andFeatures &= features;
diff --git a/lld/test/ELF/aarch64-feature-pac.s 
b/lld/test/ELF/aarch64-feature-pac.s
index b85a33216cb5bd..4fd1fd2acea737 100644
--- a/lld/test/ELF/aarch64-feature-pac.s
+++ b/lld/test/ELF/aarch64-feature-pac.s
@@ -82,7 +82,7 @@
 
 # RUN: ld.lld %t.o %t2.o -z pac-plt %t.so -o %tpacplt.exe 2>&1 | FileCheck 
-DFILE=%t2.o --check-prefix WARN %s
 
-# WARN: warning: [[FILE]]: -z pac-plt: file does not have 
GNU_PROPERTY_AARCH64_FEATURE_1_PAC property
+# WARN: warning: [[FILE]]: -z pac-plt: file does not have 
GNU_PROPERTY_AARCH64_FEATURE_1_PAC property and no valid PAuth core info 
present for this link job
 
 # RUN: llvm-readelf -n %tpacplt.exe | FileCheck --check-prefix=PACPROP %s
 # RUN: llvm-readelf --dynamic-table %tpacplt.exe | FileCheck --check-prefix 
PACDYN2 %s
diff --git a/lld/test/ELF/aarch64-feature-pauth.s 
b/lld/test/ELF/aarch64-feature-pauth.s
index 699a650d72295a..c11073dba86f24 100644
--- a/lld/test/ELF/aarch64-feature-pauth.s
+++ b/lld/test/ELF/aarch64-feature-pauth.s
@@ -33,13 +33,53 @@
 # RUN: llvm-mc -filetype=obj -triple=aarch64-linux-gnu no-info.s -o noinfo1.o
 # RUN: cp noinfo1.o noinfo2.o
 # RUN: not ld.lld -z pauth-report=error noinfo1.o tag1.o noinfo2.o -o 
/dev/null 2>&1 | FileCheck --check-prefix ERR5 %s
-# RUN: ld.lld -z pauth-report=warning noinfo1.o tag1.o noinfo2.o -o /dev/null 
2>&1 | FileCheck --check-prefix WARN %s
+# RUN: ld.lld -z pauth-report=warning noinfo1.o tag1.o noinfo2.o -o /dev/null 
2>&1 | FileCheck --check-prefix WARN1 %s
 # RUN: ld.lld -z pauth-report=none noinfo1.o tag1.o noinfo2.o --fatal-warnings 
-o /dev/null
 
 # ERR5:  error: noinfo1.o: -z pauth-report: file does not have AArch64 
PAuth core info while 'tag1.o' has one
 # ERR5-NEXT: error: noinfo2.o: -z pauth-report: file does not have AArch64 
PAuth core info while 'tag1.o' has one
-# WARN:  warning: noinfo1.o: -z pauth-report: file does not have AArch64 
PAuth core info while 'tag1.o' has one
-# WARN-NEXT: warning: noinfo2.o: -z pauth-report: file does not have AArch64 
PAuth core info while 'tag1.o' has one
+# WARN1:  warning: noinfo1.o: -z pauth-report: file does not have AArch64 
PAuth core info while 'tag1.o' has one
+# WARN1-NEXT: warning: noinfo2.o: -z pauth-report: file does not have AArch64 
PAuth core info while 'tag1.o' has one
+
+# RUN: llvm-mc -filetype=obj -triple=aarch64-linux-gnu abi-tag-zero.s  
  -o tag-zero.o
+# RUN: llvm-mc -filetype=obj -triple=aarch64-linux-gnu 
%p/Inputs/aarch64-func2.s -o func2.o
+# RUN: llvm-mc -filetype=obj -triple=aarch64-linux-gnu 
%p/Inputs/aarch64-func3.s -o func3.o
+# RUN: ld.lld func3.o --shared -o func3.so
+# RUN: ld.lld tag1.o func2.o func3.so -z pac-plt --shared -o pacplt-nowarn 
--fatal-warnings
+# RUN: ld.lld tag-zero.o func2.o func3.so -z pac-plt --shared -o pacplt-warn 
2>&1 | FileCheck --check-prefix WARN2 %s
+
+# WARN2:  warning: tag-zero.o: -z pac-plt: file does not have 
GNU_PROPERTY_AARCH64_FEATURE_1_PAC property and no valid PAuth core info 
present for this link job
+# WARN2-NEXT: warning: func2.o: -z pac-plt: file does not have 
GNU_PROPERTY_AARCH64_FEATURE_1_PAC property and no valid PA

[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)

2024-10-18 Thread Kyungwoo Lee via llvm-branch-commits


https://github.com/kyulee-com edited 
https://github.com/llvm/llvm-project/pull/112638
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: RBLegalize rules for load (PR #112882)

2024-10-18 Thread Petar Avramovic via llvm-branch-commits


https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/112882

>From f354d303a9addd878dbca7ba88ae71a196173518 Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Thu, 17 Oct 2024 16:39:55 +0200
Subject: [PATCH] AMDGPU/GlobalISel: RBLegalize rules for load

Add IDs for bit width that cover multiple LLTs: B32 B64 etc.
"Predicate" wrapper class for bool predicate functions used to
write pretty rules. Predicates can be combined using &&, || and !.
Lowering for splitting and widening loads.
Write rules for loads to not change existing mir tests from old
regbankselect.
---
 .../Target/AMDGPU/AMDGPURBLegalizeHelper.cpp  | 302 -
 .../Target/AMDGPU/AMDGPURBLegalizeHelper.h|   7 +-
 .../Target/AMDGPU/AMDGPURBLegalizeRules.cpp   | 307 -
 .../lib/Target/AMDGPU/AMDGPURBLegalizeRules.h |  65 +++-
 .../AMDGPU/GlobalISel/regbankselect-load.mir  | 320 +++---
 .../GlobalISel/regbankselect-zextload.mir |   9 +-
 6 files changed, 942 insertions(+), 68 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPURBLegalizeHelper.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURBLegalizeHelper.cpp
index a0f6ecedab7a83..f58f0a315096d2 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURBLegalizeHelper.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURBLegalizeHelper.cpp
@@ -37,6 +37,97 @@ bool 
RegBankLegalizeHelper::findRuleAndApplyMapping(MachineInstr &MI) {
   return true;
 }
 
+void RegBankLegalizeHelper::splitLoad(MachineInstr &MI,
+  ArrayRef LLTBreakdown, LLT MergeTy) 
{
+  MachineFunction &MF = B.getMF();
+  assert(MI.getNumMemOperands() == 1);
+  MachineMemOperand &BaseMMO = **MI.memoperands_begin();
+  Register Dst = MI.getOperand(0).getReg();
+  const RegisterBank *DstRB = MRI.getRegBankOrNull(Dst);
+  Register BasePtrReg = MI.getOperand(1).getReg();
+  LLT PtrTy = MRI.getType(BasePtrReg);
+  const RegisterBank *PtrRB = MRI.getRegBankOrNull(BasePtrReg);
+  LLT OffsetTy = LLT::scalar(PtrTy.getSizeInBits());
+  SmallVector LoadPartRegs;
+
+  unsigned ByteOffset = 0;
+  for (LLT PartTy : LLTBreakdown) {
+Register BasePtrPlusOffsetReg;
+if (ByteOffset == 0) {
+  BasePtrPlusOffsetReg = BasePtrReg;
+} else {
+  BasePtrPlusOffsetReg = MRI.createVirtualRegister({PtrRB, PtrTy});
+  Register OffsetReg = MRI.createVirtualRegister({PtrRB, OffsetTy});
+  B.buildConstant(OffsetReg, ByteOffset);
+  B.buildPtrAdd(BasePtrPlusOffsetReg, BasePtrReg, OffsetReg);
+}
+MachineMemOperand *BasePtrPlusOffsetMMO =
+MF.getMachineMemOperand(&BaseMMO, ByteOffset, PartTy);
+Register PartLoad = MRI.createVirtualRegister({DstRB, PartTy});
+B.buildLoad(PartLoad, BasePtrPlusOffsetReg, *BasePtrPlusOffsetMMO);
+LoadPartRegs.push_back(PartLoad);
+ByteOffset += PartTy.getSizeInBytes();
+  }
+
+  if (!MergeTy.isValid()) {
+// Loads are of same size, concat or merge them together.
+B.buildMergeLikeInstr(Dst, LoadPartRegs);
+  } else {
+// Load(s) are not all of same size, need to unmerge them to smaller pieces
+// of MergeTy type, then merge them all together in Dst.
+SmallVector MergeTyParts;
+for (Register Reg : LoadPartRegs) {
+  if (MRI.getType(Reg) == MergeTy) {
+MergeTyParts.push_back(Reg);
+  } else {
+auto Unmerge = B.buildUnmerge(MergeTy, Reg);
+for (unsigned i = 0; i < Unmerge->getNumOperands() - 1; ++i) {
+  Register UnmergeReg = Unmerge->getOperand(i).getReg();
+  MRI.setRegBank(UnmergeReg, *DstRB);
+  MergeTyParts.push_back(UnmergeReg);
+}
+  }
+}
+B.buildMergeLikeInstr(Dst, MergeTyParts);
+  }
+  MI.eraseFromParent();
+}
+
+void RegBankLegalizeHelper::widenLoad(MachineInstr &MI, LLT WideTy,
+  LLT MergeTy) {
+  MachineFunction &MF = B.getMF();
+  assert(MI.getNumMemOperands() == 1);
+  MachineMemOperand &BaseMMO = **MI.memoperands_begin();
+  Register Dst = MI.getOperand(0).getReg();
+  const RegisterBank *DstRB = MRI.getRegBankOrNull(Dst);
+  Register BasePtrReg = MI.getOperand(1).getReg();
+
+  Register BasePtrPlusOffsetReg;
+  BasePtrPlusOffsetReg = BasePtrReg;
+
+  MachineMemOperand *BasePtrPlusOffsetMMO =
+  MF.getMachineMemOperand(&BaseMMO, 0, WideTy);
+  Register WideLoad = MRI.createVirtualRegister({DstRB, WideTy});
+  B.buildLoad(WideLoad, BasePtrPlusOffsetReg, *BasePtrPlusOffsetMMO);
+
+  if (WideTy.isScalar()) {
+B.buildTrunc(Dst, WideLoad);
+  } else {
+SmallVector MergeTyParts;
+unsigned NumEltsMerge =
+MRI.getType(Dst).getSizeInBits() / MergeTy.getSizeInBits();
+auto Unmerge = B.buildUnmerge(MergeTy, WideLoad);
+for (unsigned i = 0; i < Unmerge->getNumOperands() - 1; ++i) {
+  Register UnmergeReg = Unmerge->getOperand(i).getReg();
+  MRI.setRegBank(UnmergeReg, *DstRB);
+  if (i < NumEltsMerge)
+MergeTyParts.push_back(UnmergeReg);
+}
+B.buildMergeLikeInstr(Dst, MergeTyParts);
+  }
+  MI.erase

[llvm-branch-commits] [llvm] MachineUniformityAnalysis: Improve isConstantOrUndefValuePhi (PR #112866)

2024-10-18 Thread Petar Avramovic via llvm-branch-commits


https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/112866

>From 317c41b80b26e55ee35c5859700d91d36e58cd2a Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Thu, 17 Oct 2024 15:43:06 +0200
Subject: [PATCH] MachineUniformityAnalysis: Improve isConstantOrUndefValuePhi

Change existing code to match what LLVM-IR version is doing via
PHINode::hasConstantOrUndefValue.
Most notably this improves number of values that can be allocated
to sgpr in AMDGPU's RBSelect.
Common case here are phis that appear in structurize-cfg lowering
for cycles with multiple exits:
Undef incoming value is coming from block that reached cycle exit
condition, if other incoming is uniform keep the phi uniform despite
the fact it is joining values from pair of blocks that are entered
via divergent condition branch.
---
 llvm/lib/CodeGen/MachineSSAContext.cpp| 23 +-
 .../AMDGPU/MIR/hidden-diverge-gmir.mir| 28 +++
 .../AMDGPU/MIR/hidden-loop-diverge.mir|  4 +-
 .../AMDGPU/MIR/uses-value-from-cycle.mir  |  8 +-
 .../GlobalISel/divergence-structurizer.mir| 80 --
 .../regbankselect-mui-rb-legalize.mir | 70 
 .../regbankselect-mui-rb-select.mir   | 18 ++---
 .../AMDGPU/GlobalISel/regbankselect-mui.ll| 81 ++-
 .../AMDGPU/GlobalISel/regbankselect-mui.mir   | 52 ++--
 9 files changed, 188 insertions(+), 176 deletions(-)

diff --git a/llvm/lib/CodeGen/MachineSSAContext.cpp 
b/llvm/lib/CodeGen/MachineSSAContext.cpp
index e384187b6e8593..c21838d227e2d3 100644
--- a/llvm/lib/CodeGen/MachineSSAContext.cpp
+++ b/llvm/lib/CodeGen/MachineSSAContext.cpp
@@ -54,9 +54,28 @@ const MachineBasicBlock 
*MachineSSAContext::getDefBlock(Register value) const {
   return F->getRegInfo().getVRegDef(value)->getParent();
 }
 
+static bool isUndef(const MachineInstr &MI) {
+  return MI.getOpcode() == TargetOpcode::G_IMPLICIT_DEF ||
+ MI.getOpcode() == TargetOpcode::IMPLICIT_DEF;
+}
+
+/// MachineInstr equivalent of PHINode::hasConstantOrUndefValue()
 template <>
-bool MachineSSAContext::isConstantOrUndefValuePhi(const MachineInstr &Phi) {
-  return Phi.isConstantValuePHI();
+bool MachineSSAContext::isConstantOrUndefValuePhi(const MachineInstr &MI) {
+  if (!MI.isPHI())
+return false;
+  const MachineRegisterInfo &MRI = MI.getMF()->getRegInfo();
+  Register This = MI.getOperand(0).getReg();
+  Register ConstantValue;
+  for (unsigned i = 1, e = MI.getNumOperands(); i < e; i += 2) {
+Register Incoming = MI.getOperand(i).getReg();
+if (Incoming != This && !isUndef(*MRI.getVRegDef(Incoming))) {
+  if (ConstantValue && ConstantValue != Incoming)
+return false;
+  ConstantValue = Incoming;
+}
+  }
+  return true;
 }
 
 template <>
diff --git 
a/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir 
b/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir
index ce00edf3363f77..9694a340b5e906 100644
--- a/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir
+++ b/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/hidden-diverge-gmir.mir
@@ -1,24 +1,24 @@
 # RUN: llc -mtriple=amdgcn-- -run-pass=print-machine-uniformity -o - %s 2>&1 | 
FileCheck %s
 # CHECK-LABEL: MachineUniformityInfo for function: hidden_diverge
 # CHECK-LABEL: BLOCK bb.0
-# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s32) = G_INTRINSIC 
intrinsic(@llvm.amdgcn.workitem.id.x)
-# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_ICMP intpred(slt)
-# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_XOR %{{[0-9]*}}:_, 
%{{[0-9]*}}:_
-# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = 
G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if)
-# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = 
G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if)
-# CHECK: DIVERGENT: G_BRCOND %{{[0-9]*}}:_(s1), %bb.1
-# CHECK: DIVERGENT: G_BR %bb.2
+# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s32) = G_INTRINSIC 
intrinsic(@llvm.amdgcn.workitem.id.x)
+# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_ICMP intpred(slt)
+# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_XOR %{{[0-9]*}}:_, 
%{{[0-9]*}}:_
+# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = 
G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if)
+# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = 
G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.if)
+# CHECK: DIVERGENT: G_BRCOND %{{[0-9]*}}:_(s1), %bb.1
+# CHECK: DIVERGENT: G_BR %bb.2
 # CHECK-LABEL: BLOCK bb.1
 # CHECK-LABEL: BLOCK bb.2
-# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s32) = G_PHI 
%{{[0-9]*}}:_(s32), %bb.1, %{{[0-9]*}}:_(s32), %bb.0
-# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1) = G_PHI %{{[0-9]*}}:_(s1), 
%bb.1, %{{[0-9]*}}:_(s1), %bb.0
-# CHECK: DIVERGENT: %{{[0-9]*}}: %{{[0-9]*}}:_(s1), %{{[0-9]*}}:_(s64) = 
G_INTRINSIC

[llvm-branch-commits] [flang] [flang][cuda] Translate cuf.register_kernel and cuf.register_module (PR #112972)

2024-10-18 Thread Valentin Clement バレンタインクレメン via llvm-branch-commits


https://github.com/clementval updated 
https://github.com/llvm/llvm-project/pull/112972

>From 49f9aa765e7ac39cf09e2ae12a656aa0c76b7f43 Mon Sep 17 00:00:00 2001
From: Valentin Clement 
Date: Thu, 17 Oct 2024 14:36:04 -0700
Subject: [PATCH 1/2] [flang][cuda] Translate cuf.register_kernel and
 cuf.register_module

---
 .../Dialect/CUF/CUFToLLVMIRTranslation.h  |  29 +
 .../include/flang/Optimizer/Support/InitFIR.h |   2 +
 .../include/flang/Runtime/CUDA/registration.h |  28 +
 .../lib/Optimizer/Dialect/CUF/CMakeLists.txt  |   1 +
 .../Dialect/CUF/CUFToLLVMIRTranslation.cpp| 104 ++
 .../Optimizer/Transforms/CufOpConversion.cpp  |   1 +
 flang/runtime/CUDA/CMakeLists.txt |   1 +
 flang/runtime/CUDA/registration.cpp   |  31 ++
 8 files changed, 197 insertions(+)
 create mode 100644 
flang/include/flang/Optimizer/Dialect/CUF/CUFToLLVMIRTranslation.h
 create mode 100644 flang/include/flang/Runtime/CUDA/registration.h
 create mode 100644 flang/lib/Optimizer/Dialect/CUF/CUFToLLVMIRTranslation.cpp
 create mode 100644 flang/runtime/CUDA/registration.cpp

diff --git a/flang/include/flang/Optimizer/Dialect/CUF/CUFToLLVMIRTranslation.h 
b/flang/include/flang/Optimizer/Dialect/CUF/CUFToLLVMIRTranslation.h
new file mode 100644
index 00..f3edb7fca649d0
--- /dev/null
+++ b/flang/include/flang/Optimizer/Dialect/CUF/CUFToLLVMIRTranslation.h
@@ -0,0 +1,29 @@
+//===- CUFToLLVMIRTranslation.h - CUF Dialect to LLVM IR *- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This provides registration calls for GPU dialect to LLVM IR translation.
+//
+//===--===//
+
+#ifndef FLANG_OPTIMIZER_DIALECT_CUF_GPUTOLLVMIRTRANSLATION_H_
+#define FLANG_OPTIMIZER_DIALECT_CUF_GPUTOLLVMIRTRANSLATION_H_
+
+namespace mlir {
+class DialectRegistry;
+class MLIRContext;
+} // namespace mlir
+
+namespace cuf {
+
+/// Register the CUF dialect and the translation from it to the LLVM IR in
+/// the given registry.
+void registerCUFDialectTranslation(mlir::DialectRegistry ®istry);
+
+} // namespace cuf
+
+#endif // FLANG_OPTIMIZER_DIALECT_CUF_GPUTOLLVMIRTRANSLATION_H_
diff --git a/flang/include/flang/Optimizer/Support/InitFIR.h 
b/flang/include/flang/Optimizer/Support/InitFIR.h
index 04a5dd323e5508..1c61c367199923 100644
--- a/flang/include/flang/Optimizer/Support/InitFIR.h
+++ b/flang/include/flang/Optimizer/Support/InitFIR.h
@@ -14,6 +14,7 @@
 #define FORTRAN_OPTIMIZER_SUPPORT_INITFIR_H
 
 #include "flang/Optimizer/Dialect/CUF/CUFDialect.h"
+#include "flang/Optimizer/Dialect/CUF/CUFToLLVMIRTranslation.h"
 #include "flang/Optimizer/Dialect/FIRDialect.h"
 #include "flang/Optimizer/HLFIR/HLFIRDialect.h"
 #include "mlir/Conversion/Passes.h"
@@ -61,6 +62,7 @@ inline void addFIRExtensions(mlir::DialectRegistry ®istry,
   if (addFIRInlinerInterface)
 addFIRInlinerExtension(registry);
   addFIRToLLVMIRExtension(registry);
+  cuf::registerCUFDialectTranslation(registry);
 }
 
 inline void loadNonCodegenDialects(mlir::MLIRContext &context) {
diff --git a/flang/include/flang/Runtime/CUDA/registration.h 
b/flang/include/flang/Runtime/CUDA/registration.h
new file mode 100644
index 00..cbe202c4d23e0d
--- /dev/null
+++ b/flang/include/flang/Runtime/CUDA/registration.h
@@ -0,0 +1,28 @@
+//===-- include/flang/Runtime/CUDA/registration.h ---*- C -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef FORTRAN_RUNTIME_CUDA_REGISTRATION_H_
+#define FORTRAN_RUNTIME_CUDA_REGISTRATION_H_
+
+#include "flang/Runtime/entry-names.h"
+#include 
+
+namespace Fortran::runtime::cuda {
+
+extern "C" {
+
+/// Register a CUDA module.
+void *RTDECL(CUFRegisterModule)(void *data);
+
+/// Register a device function.
+void RTDECL(CUFRegisterFunction)(void **module, const char *fct);
+
+} // extern "C"
+
+} // namespace Fortran::runtime::cuda
+#endif // FORTRAN_RUNTIME_CUDA_REGISTRATION_H_
diff --git a/flang/lib/Optimizer/Dialect/CUF/CMakeLists.txt 
b/flang/lib/Optimizer/Dialect/CUF/CMakeLists.txt
index b222115d58..5d4bd0785971f7 100644
--- a/flang/lib/Optimizer/Dialect/CUF/CMakeLists.txt
+++ b/flang/lib/Optimizer/Dialect/CUF/CMakeLists.txt
@@ -3,6 +3,7 @@ add_subdirectory(Attributes)
 add_flang_library(CUFDialect
   CUFDialect.cpp
   CUFOps.cpp
+  CUFToLLVMIRTranslation.cpp
 
   DEPENDS
   MLIRIR
diff --git a/flang/lib/Optimizer/Dialect/CUF/CUFToLLVMIRTranslation.cpp 
b/flang/lib/Optimizer/Dialect/CUF/CU

[llvm-branch-commits] [MC] Use StringRefs from pseudo_probe_desc section if it's mapped (PR #112996)

2024-10-18 Thread Amir Ayupov via llvm-branch-commits


https://github.com/aaupov created 
https://github.com/llvm/llvm-project/pull/112996

Add `IsMMapped` flag to `buildGUID2FuncDescMap` controlling whether to
allocate a string in `FuncNameAllocator` or use StringRef directly.

This saves ~0.7 GiB peak RSS in perf2bolt processing a medium sized
binary with ~0.8 GiB .pseudo_probe_desc section.

This is because BOLT keeps file sections in memory while processing them
whereas llvm-profgen constructs GUID2FuncDescMap and then releases the
binary.

Test Plan: no-op for llvm-profgen, NFC for perf2bolt



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [MC] Use StringRefs from pseudo_probe_desc section if it's mapped (PR #112996)

2024-10-18 Thread via llvm-branch-commits


github-actions[bot] wrote:




:warning: C/C++ code formatter, clang-format found issues in your code. 
:warning:



You can test this locally with the following command:


``bash
git-clang-format --diff b6275868f9955e64624e67921a24b02c2d5c67fe 
0e8d8a451a59eb008800d039692fc17b4f903300 --extensions cpp,h -- 
bolt/lib/Rewrite/PseudoProbeRewriter.cpp llvm/include/llvm/MC/MCPseudoProbe.h 
llvm/lib/MC/MCPseudoProbe.cpp
``





View the diff from clang-format here.


``diff
diff --git a/bolt/lib/Rewrite/PseudoProbeRewriter.cpp 
b/bolt/lib/Rewrite/PseudoProbeRewriter.cpp
index 4fecfe8c3c..09aa4fbb66 100644
--- a/bolt/lib/Rewrite/PseudoProbeRewriter.cpp
+++ b/bolt/lib/Rewrite/PseudoProbeRewriter.cpp
@@ -127,8 +127,8 @@ void PseudoProbeRewriter::parsePseudoProbe(bool 
ProfiledOnly) {
 
   StringRef Contents = PseudoProbeDescSection->getContents();
   if (!ProbeDecoder.buildGUID2FuncDescMap(
-  reinterpret_cast(Contents.data()),
-  Contents.size(), /*IsMMapped*/true)) {
+  reinterpret_cast(Contents.data()), Contents.size(),
+  /*IsMMapped*/ true)) {
 errs() << "BOLT-WARNING: fail in building GUID2FuncDescMap\n";
 return;
   }

``




https://github.com/llvm/llvm-project/pull/112996
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)

2024-10-18 Thread Paul Kirth via llvm-branch-commits


https://github.com/ilovepi commented:

IIRC we have several lit tests that cover structural hash, shouldn't we have a 
new test there that uses the new functionality?

https://github.com/llvm/llvm-project/pull/112638
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)

2024-10-18 Thread Paul Kirth via llvm-branch-commits


https://github.com/ilovepi edited 
https://github.com/llvm/llvm-project/pull/112638
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)

2024-10-18 Thread Paul Kirth via llvm-branch-commits



@@ -184,6 +329,12 @@ class StructuralHashImpl {
   }
 
   uint64_t getHash() const { return Hash; }
+  std::unique_ptr getIndexInstrMap() {
+return std::move(IndexInstruction);
+  }
+  std::unique_ptr getIndexPairOpndHashMap() {

ilovepi wrote:

```suggestion
  }
  
  std::unique_ptr getIndexPairOpndHashMap() {
```

https://github.com/llvm/llvm-project/pull/112638
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)

2024-10-18 Thread Paul Kirth via llvm-branch-commits



@@ -47,24 +60,140 @@ class StructuralHashImpl {
 
 public:
   StructuralHashImpl() = delete;
-  explicit StructuralHashImpl(bool DetailedHash) : DetailedHash(DetailedHash) 
{}
+  explicit StructuralHashImpl(bool DetailedHash,
+  IgnoreOperandFunc IgnoreOp = nullptr)
+  : DetailedHash(DetailedHash), IgnoreOp(IgnoreOp) {
+if (IgnoreOp) {
+  IndexInstruction = std::make_unique();
+  IndexOperandHashMap = std::make_unique();
+}
+  }
 
-  stable_hash hashConstant(Constant *C) {
+  stable_hash hashAPInt(const APInt &I) {
 SmallVector Hashes;
-// TODO: hashArbitaryType() is not stable.
-if (ConstantInt *ConstInt = dyn_cast(C)) {
-  Hashes.emplace_back(hashArbitaryType(ConstInt->getValue()));
-} else if (ConstantFP *ConstFP = dyn_cast(C)) {
-  Hashes.emplace_back(hashArbitaryType(ConstFP->getValue()));
-} else if (Function *Func = dyn_cast(C))
-  // Hashing the name will be deterministic as LLVM's hashing 
infrastructure
-  // has explicit support for hashing strings and will not simply hash
-  // the pointer.
-  Hashes.emplace_back(hashArbitaryType(Func->getName()));
+Hashes.emplace_back(I.getBitWidth());
+for (unsigned J = 0; J < I.getNumWords(); ++J)
+  Hashes.emplace_back((I.getRawData())[J]);
+return stable_hash_combine(Hashes);
+  }
 
+  stable_hash hashAPFloat(const APFloat &F) {
+SmallVector Hashes;
+const fltSemantics &S = F.getSemantics();
+Hashes.emplace_back(APFloat::semanticsPrecision(S));
+Hashes.emplace_back(APFloat::semanticsMaxExponent(S));
+Hashes.emplace_back(APFloat::semanticsMinExponent(S));
+Hashes.emplace_back(APFloat::semanticsSizeInBits(S));
+Hashes.emplace_back(hashAPInt(F.bitcastToAPInt()));
 return stable_hash_combine(Hashes);
   }
 
+  stable_hash hashGlobalValue(const GlobalValue *GV) {
+if (!GV->hasName())
+  return 0;
+return stable_hash_name(GV->getName());
+  }
+
+  // Compute a hash for a Constant. This function is logically similar to
+  // FunctionComparator::cmpConstants() in FunctionComparator.cpp, but here
+  // we're interested in computing a hash rather than comparing two Constants.
+  // Some of the logic is simplified, e.g, we don't expand GEPOperator.
+  stable_hash hashConstant(Constant *C) {
+SmallVector Hashes;
+
+Type *Ty = C->getType();
+Hashes.emplace_back(hashType(Ty));
+
+if (C->isNullValue()) {
+  Hashes.emplace_back(static_cast('N'));
+  return stable_hash_combine(Hashes);
+}
+
+auto *G = dyn_cast(C);
+if (G) {
+  Hashes.emplace_back(hashGlobalValue(G));
+  return stable_hash_combine(Hashes);
+}
+
+if (const auto *Seq = dyn_cast(C)) {
+  Hashes.emplace_back(xxh3_64bits(Seq->getRawDataValues()));
+  return stable_hash_combine(Hashes);
+}
+
+switch (C->getValueID()) {
+case Value::UndefValueVal:
+case Value::PoisonValueVal:
+case Value::ConstantTokenNoneVal: {
+  return stable_hash_combine(Hashes);
+}
+case Value::ConstantIntVal: {
+  const APInt &Int = cast(C)->getValue();
+  Hashes.emplace_back(hashAPInt(Int));
+  return stable_hash_combine(Hashes);
+}
+case Value::ConstantFPVal: {
+  const APFloat &APF = cast(C)->getValueAPF();
+  Hashes.emplace_back(hashAPFloat(APF));
+  return stable_hash_combine(Hashes);
+}
+case Value::ConstantArrayVal: {
+  const ConstantArray *A = cast(C);
+  uint64_t NumElements = cast(Ty)->getNumElements();
+  Hashes.emplace_back(NumElements);
+  for (auto &Op : A->operands()) {
+auto H = hashConstant(cast(Op));
+Hashes.emplace_back(H);
+  }
+  return stable_hash_combine(Hashes);
+}
+case Value::ConstantStructVal: {
+  const ConstantStruct *S = cast(C);
+  unsigned NumElements = cast(Ty)->getNumElements();
+  Hashes.emplace_back(NumElements);
+  for (auto &Op : S->operands()) {
+auto H = hashConstant(cast(Op));
+Hashes.emplace_back(H);
+  }
+  return stable_hash_combine(Hashes);
+}
+case Value::ConstantVectorVal: {
+  const ConstantVector *V = cast(C);
+  unsigned NumElements = cast(Ty)->getNumElements();
+  Hashes.emplace_back(NumElements);
+  for (auto &Op : V->operands()) {
+auto H = hashConstant(cast(Op));
+Hashes.emplace_back(H);
+  }
+  return stable_hash_combine(Hashes);
+}
+case Value::ConstantExprVal: {
+  const ConstantExpr *E = cast(C);
+  unsigned NumOperands = E->getNumOperands();
+  Hashes.emplace_back(NumOperands);
+  for (auto &Op : E->operands()) {
+auto H = hashConstant(cast(Op));
+Hashes.emplace_back(H);
+  }
+  return stable_hash_combine(Hashes);
+}
+case Value::BlockAddressVal: {
+  const BlockAddress *BA = cast(C);
+  auto H = hashGlobalValue(BA->getFunction());
+  Hashes.emplace_back(H);
+  return stable_hash_co

[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)

2024-10-18 Thread Paul Kirth via llvm-branch-commits



@@ -47,24 +60,140 @@ class StructuralHashImpl {
 
 public:
   StructuralHashImpl() = delete;
-  explicit StructuralHashImpl(bool DetailedHash) : DetailedHash(DetailedHash) 
{}
+  explicit StructuralHashImpl(bool DetailedHash,
+  IgnoreOperandFunc IgnoreOp = nullptr)
+  : DetailedHash(DetailedHash), IgnoreOp(IgnoreOp) {
+if (IgnoreOp) {
+  IndexInstruction = std::make_unique();
+  IndexOperandHashMap = std::make_unique();
+}
+  }
 
-  stable_hash hashConstant(Constant *C) {
+  stable_hash hashAPInt(const APInt &I) {
 SmallVector Hashes;
-// TODO: hashArbitaryType() is not stable.
-if (ConstantInt *ConstInt = dyn_cast(C)) {
-  Hashes.emplace_back(hashArbitaryType(ConstInt->getValue()));
-} else if (ConstantFP *ConstFP = dyn_cast(C)) {
-  Hashes.emplace_back(hashArbitaryType(ConstFP->getValue()));
-} else if (Function *Func = dyn_cast(C))
-  // Hashing the name will be deterministic as LLVM's hashing 
infrastructure
-  // has explicit support for hashing strings and will not simply hash
-  // the pointer.
-  Hashes.emplace_back(hashArbitaryType(Func->getName()));
+Hashes.emplace_back(I.getBitWidth());
+for (unsigned J = 0; J < I.getNumWords(); ++J)
+  Hashes.emplace_back((I.getRawData())[J]);
+return stable_hash_combine(Hashes);
+  }
 
+  stable_hash hashAPFloat(const APFloat &F) {
+SmallVector Hashes;
+const fltSemantics &S = F.getSemantics();
+Hashes.emplace_back(APFloat::semanticsPrecision(S));
+Hashes.emplace_back(APFloat::semanticsMaxExponent(S));
+Hashes.emplace_back(APFloat::semanticsMinExponent(S));
+Hashes.emplace_back(APFloat::semanticsSizeInBits(S));
+Hashes.emplace_back(hashAPInt(F.bitcastToAPInt()));
 return stable_hash_combine(Hashes);
   }
 
+  stable_hash hashGlobalValue(const GlobalValue *GV) {
+if (!GV->hasName())
+  return 0;
+return stable_hash_name(GV->getName());
+  }
+
+  // Compute a hash for a Constant. This function is logically similar to
+  // FunctionComparator::cmpConstants() in FunctionComparator.cpp, but here
+  // we're interested in computing a hash rather than comparing two Constants.
+  // Some of the logic is simplified, e.g, we don't expand GEPOperator.
+  stable_hash hashConstant(Constant *C) {
+SmallVector Hashes;
+
+Type *Ty = C->getType();
+Hashes.emplace_back(hashType(Ty));
+
+if (C->isNullValue()) {
+  Hashes.emplace_back(static_cast('N'));
+  return stable_hash_combine(Hashes);
+}
+
+auto *G = dyn_cast(C);
+if (G) {
+  Hashes.emplace_back(hashGlobalValue(G));
+  return stable_hash_combine(Hashes);
+}
+
+if (const auto *Seq = dyn_cast(C)) {
+  Hashes.emplace_back(xxh3_64bits(Seq->getRawDataValues()));
+  return stable_hash_combine(Hashes);
+}
+
+switch (C->getValueID()) {
+case Value::UndefValueVal:
+case Value::PoisonValueVal:
+case Value::ConstantTokenNoneVal: {
+  return stable_hash_combine(Hashes);
+}
+case Value::ConstantIntVal: {
+  const APInt &Int = cast(C)->getValue();
+  Hashes.emplace_back(hashAPInt(Int));
+  return stable_hash_combine(Hashes);
+}
+case Value::ConstantFPVal: {
+  const APFloat &APF = cast(C)->getValueAPF();
+  Hashes.emplace_back(hashAPFloat(APF));
+  return stable_hash_combine(Hashes);
+}
+case Value::ConstantArrayVal: {
+  const ConstantArray *A = cast(C);
+  uint64_t NumElements = cast(Ty)->getNumElements();
+  Hashes.emplace_back(NumElements);
+  for (auto &Op : A->operands()) {

ilovepi wrote:

I think these can be `const`, right? Or does that conflict w/ the cast?
```suggestion
  for (const auto &Op : A->operands()) {
```

https://github.com/llvm/llvm-project/pull/112638
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)

2024-10-18 Thread Paul Kirth via llvm-branch-commits



@@ -47,24 +60,140 @@ class StructuralHashImpl {
 
 public:
   StructuralHashImpl() = delete;
-  explicit StructuralHashImpl(bool DetailedHash) : DetailedHash(DetailedHash) 
{}
+  explicit StructuralHashImpl(bool DetailedHash,
+  IgnoreOperandFunc IgnoreOp = nullptr)
+  : DetailedHash(DetailedHash), IgnoreOp(IgnoreOp) {
+if (IgnoreOp) {
+  IndexInstruction = std::make_unique();
+  IndexOperandHashMap = std::make_unique();
+}
+  }
 
-  stable_hash hashConstant(Constant *C) {
+  stable_hash hashAPInt(const APInt &I) {
 SmallVector Hashes;
-// TODO: hashArbitaryType() is not stable.
-if (ConstantInt *ConstInt = dyn_cast(C)) {
-  Hashes.emplace_back(hashArbitaryType(ConstInt->getValue()));
-} else if (ConstantFP *ConstFP = dyn_cast(C)) {
-  Hashes.emplace_back(hashArbitaryType(ConstFP->getValue()));
-} else if (Function *Func = dyn_cast(C))
-  // Hashing the name will be deterministic as LLVM's hashing 
infrastructure
-  // has explicit support for hashing strings and will not simply hash
-  // the pointer.
-  Hashes.emplace_back(hashArbitaryType(Func->getName()));
+Hashes.emplace_back(I.getBitWidth());
+for (unsigned J = 0; J < I.getNumWords(); ++J)
+  Hashes.emplace_back((I.getRawData())[J]);
+return stable_hash_combine(Hashes);
+  }
 
+  stable_hash hashAPFloat(const APFloat &F) {
+SmallVector Hashes;
+const fltSemantics &S = F.getSemantics();
+Hashes.emplace_back(APFloat::semanticsPrecision(S));
+Hashes.emplace_back(APFloat::semanticsMaxExponent(S));
+Hashes.emplace_back(APFloat::semanticsMinExponent(S));
+Hashes.emplace_back(APFloat::semanticsSizeInBits(S));
+Hashes.emplace_back(hashAPInt(F.bitcastToAPInt()));
 return stable_hash_combine(Hashes);
   }
 
+  stable_hash hashGlobalValue(const GlobalValue *GV) {
+if (!GV->hasName())
+  return 0;
+return stable_hash_name(GV->getName());
+  }
+
+  // Compute a hash for a Constant. This function is logically similar to
+  // FunctionComparator::cmpConstants() in FunctionComparator.cpp, but here
+  // we're interested in computing a hash rather than comparing two Constants.
+  // Some of the logic is simplified, e.g, we don't expand GEPOperator.
+  stable_hash hashConstant(Constant *C) {
+SmallVector Hashes;
+
+Type *Ty = C->getType();
+Hashes.emplace_back(hashType(Ty));
+
+if (C->isNullValue()) {
+  Hashes.emplace_back(static_cast('N'));
+  return stable_hash_combine(Hashes);
+}
+
+auto *G = dyn_cast(C);
+if (G) {
+  Hashes.emplace_back(hashGlobalValue(G));
+  return stable_hash_combine(Hashes);
+}
+
+if (const auto *Seq = dyn_cast(C)) {
+  Hashes.emplace_back(xxh3_64bits(Seq->getRawDataValues()));
+  return stable_hash_combine(Hashes);
+}
+
+switch (C->getValueID()) {
+case Value::UndefValueVal:
+case Value::PoisonValueVal:
+case Value::ConstantTokenNoneVal: {
+  return stable_hash_combine(Hashes);
+}
+case Value::ConstantIntVal: {
+  const APInt &Int = cast(C)->getValue();
+  Hashes.emplace_back(hashAPInt(Int));
+  return stable_hash_combine(Hashes);
+}
+case Value::ConstantFPVal: {
+  const APFloat &APF = cast(C)->getValueAPF();
+  Hashes.emplace_back(hashAPFloat(APF));
+  return stable_hash_combine(Hashes);
+}
+case Value::ConstantArrayVal: {
+  const ConstantArray *A = cast(C);
+  uint64_t NumElements = cast(Ty)->getNumElements();
+  Hashes.emplace_back(NumElements);
+  for (auto &Op : A->operands()) {
+auto H = hashConstant(cast(Op));
+Hashes.emplace_back(H);
+  }
+  return stable_hash_combine(Hashes);
+}
+case Value::ConstantStructVal: {
+  const ConstantStruct *S = cast(C);
+  unsigned NumElements = cast(Ty)->getNumElements();
+  Hashes.emplace_back(NumElements);
+  for (auto &Op : S->operands()) {
+auto H = hashConstant(cast(Op));
+Hashes.emplace_back(H);
+  }
+  return stable_hash_combine(Hashes);

ilovepi wrote:

This pattern seems used quite a bit. Maybe it should be a helper function? 

https://github.com/llvm/llvm-project/pull/112638
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)

2024-10-18 Thread Ellis Hoag via llvm-branch-commits



@@ -47,24 +60,140 @@ class StructuralHashImpl {
 
 public:
   StructuralHashImpl() = delete;
-  explicit StructuralHashImpl(bool DetailedHash) : DetailedHash(DetailedHash) 
{}
+  explicit StructuralHashImpl(bool DetailedHash,
+  IgnoreOperandFunc IgnoreOp = nullptr)
+  : DetailedHash(DetailedHash), IgnoreOp(IgnoreOp) {
+if (IgnoreOp) {
+  IndexInstruction = std::make_unique();
+  IndexOperandHashMap = std::make_unique();
+}
+  }
 
-  stable_hash hashConstant(Constant *C) {
+  stable_hash hashAPInt(const APInt &I) {
 SmallVector Hashes;
-// TODO: hashArbitaryType() is not stable.
-if (ConstantInt *ConstInt = dyn_cast(C)) {
-  Hashes.emplace_back(hashArbitaryType(ConstInt->getValue()));
-} else if (ConstantFP *ConstFP = dyn_cast(C)) {
-  Hashes.emplace_back(hashArbitaryType(ConstFP->getValue()));
-} else if (Function *Func = dyn_cast(C))
-  // Hashing the name will be deterministic as LLVM's hashing 
infrastructure
-  // has explicit support for hashing strings and will not simply hash
-  // the pointer.
-  Hashes.emplace_back(hashArbitaryType(Func->getName()));
+Hashes.emplace_back(I.getBitWidth());
+for (unsigned J = 0; J < I.getNumWords(); ++J)
+  Hashes.emplace_back((I.getRawData())[J]);

ellishg wrote:

I also wonder what the difference is between `getNumWords()` and 
`getActiveWords()`.

```suggestion
for (unsigned Byte : ArrayRef(I.getRawData(), I.getNumWords()))
  Hashes.emplace_back(Byte);
```

Actually, I wonder if this will work since `ArrayRef` can become a 
`SmallVector`. Or maybe I'm just being too fancy :)
```
Hashes.append(ArrayRef(I.getRawData(), I.getNumWords()));
```

https://github.com/llvm/llvm-project/pull/112638
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [MC] Use StringRefs from pseudo_probe_desc section if it's mapped (PR #112996)

2024-10-18 Thread Amir Ayupov via llvm-branch-commits


https://github.com/aaupov updated 
https://github.com/llvm/llvm-project/pull/112996

>From a54e4f1f17c153272583eda3f7a2bbd7a928b34d Mon Sep 17 00:00:00 2001
From: Amir Ayupov 
Date: Fri, 18 Oct 2024 18:24:17 -0700
Subject: [PATCH] clang-format

Created using spr 1.3.4
---
 bolt/lib/Rewrite/PseudoProbeRewriter.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/bolt/lib/Rewrite/PseudoProbeRewriter.cpp 
b/bolt/lib/Rewrite/PseudoProbeRewriter.cpp
index 4fecfe8c3c09b1..09aa4fbb66bd42 100644
--- a/bolt/lib/Rewrite/PseudoProbeRewriter.cpp
+++ b/bolt/lib/Rewrite/PseudoProbeRewriter.cpp
@@ -127,8 +127,8 @@ void PseudoProbeRewriter::parsePseudoProbe(bool 
ProfiledOnly) {
 
   StringRef Contents = PseudoProbeDescSection->getContents();
   if (!ProbeDecoder.buildGUID2FuncDescMap(
-  reinterpret_cast(Contents.data()),
-  Contents.size(), /*IsMMapped*/true)) {
+  reinterpret_cast(Contents.data()), Contents.size(),
+  /*IsMMapped*/ true)) {
 errs() << "BOLT-WARNING: fail in building GUID2FuncDescMap\n";
 return;
   }

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [CGData] Global Merge Functions (PR #112671)

2024-10-18 Thread Kyungwoo Lee via llvm-branch-commits


https://github.com/kyulee-com updated 
https://github.com/llvm/llvm-project/pull/112671

>From ded5771bb4ff7c8fd5401b4efe0af988539a8162 Mon Sep 17 00:00:00 2001
From: Kyungwoo Lee 
Date: Fri, 30 Aug 2024 00:09:09 -0700
Subject: [PATCH 1/2] [CGData] Global Merge Functions

---
 llvm/include/llvm/CGData/CodeGenData.h|  11 +
 llvm/include/llvm/InitializePasses.h  |   1 +
 llvm/include/llvm/LinkAllPasses.h |   1 +
 llvm/include/llvm/Passes/CodeGenPassBuilder.h |   1 +
 llvm/include/llvm/Transforms/IPO.h|   2 +
 .../Transforms/IPO/GlobalMergeFunctions.h |  77 ++
 llvm/lib/CodeGen/TargetPassConfig.cpp |   3 +
 llvm/lib/LTO/LTO.cpp  |   1 +
 llvm/lib/Transforms/IPO/CMakeLists.txt|   2 +
 .../Transforms/IPO/GlobalMergeFunctions.cpp   | 687 ++
 .../ThinLTO/AArch64/cgdata-merge-local.ll |  62 ++
 .../test/ThinLTO/AArch64/cgdata-merge-read.ll |  82 +++
 .../AArch64/cgdata-merge-two-rounds.ll|  68 ++
 .../ThinLTO/AArch64/cgdata-merge-write.ll |  97 +++
 llvm/tools/llvm-lto2/CMakeLists.txt   |   1 +
 llvm/tools/llvm-lto2/llvm-lto2.cpp|   6 +
 16 files changed, 1102 insertions(+)
 create mode 100644 llvm/include/llvm/Transforms/IPO/GlobalMergeFunctions.h
 create mode 100644 llvm/lib/Transforms/IPO/GlobalMergeFunctions.cpp
 create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-local.ll
 create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-read.ll
 create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-two-rounds.ll
 create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-write.ll

diff --git a/llvm/include/llvm/CGData/CodeGenData.h 
b/llvm/include/llvm/CGData/CodeGenData.h
index 5d7c74725ccef1..da0e412f2a0e03 100644
--- a/llvm/include/llvm/CGData/CodeGenData.h
+++ b/llvm/include/llvm/CGData/CodeGenData.h
@@ -145,6 +145,9 @@ class CodeGenData {
   const OutlinedHashTree *getOutlinedHashTree() {
 return PublishedHashTree.get();
   }
+  const StableFunctionMap *getStableFunctionMap() {
+return PublishedStableFunctionMap.get();
+  }
 
   /// Returns true if we should write codegen data.
   bool emitCGData() { return EmitCGData; }
@@ -169,10 +172,18 @@ inline bool hasOutlinedHashTree() {
   return CodeGenData::getInstance().hasOutlinedHashTree();
 }
 
+inline bool hasStableFunctionMap() {
+  return CodeGenData::getInstance().hasStableFunctionMap();
+}
+
 inline const OutlinedHashTree *getOutlinedHashTree() {
   return CodeGenData::getInstance().getOutlinedHashTree();
 }
 
+inline const StableFunctionMap *getStableFunctionMap() {
+  return CodeGenData::getInstance().getStableFunctionMap();
+}
+
 inline bool emitCGData() { return CodeGenData::getInstance().emitCGData(); }
 
 inline void
diff --git a/llvm/include/llvm/InitializePasses.h 
b/llvm/include/llvm/InitializePasses.h
index 4352099d6dbb99..9aa36d5bb7f801 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -123,6 +123,7 @@ void initializeGCEmptyBasicBlocksPass(PassRegistry &);
 void initializeGCMachineCodeAnalysisPass(PassRegistry &);
 void initializeGCModuleInfoPass(PassRegistry &);
 void initializeGVNLegacyPassPass(PassRegistry &);
+void initializeGlobalMergeFuncPass(PassRegistry &);
 void initializeGlobalMergePass(PassRegistry &);
 void initializeGlobalsAAWrapperPassPass(PassRegistry &);
 void initializeHardwareLoopsLegacyPass(PassRegistry &);
diff --git a/llvm/include/llvm/LinkAllPasses.h 
b/llvm/include/llvm/LinkAllPasses.h
index 92b59a66567c95..ea3609a2b4bc71 100644
--- a/llvm/include/llvm/LinkAllPasses.h
+++ b/llvm/include/llvm/LinkAllPasses.h
@@ -79,6 +79,7 @@ struct ForcePassLinking {
 (void)llvm::createDomOnlyViewerWrapperPassPass();
 (void)llvm::createDomViewerWrapperPassPass();
 (void)llvm::createAlwaysInlinerLegacyPass();
+(void)llvm::createGlobalMergeFuncPass();
 (void)llvm::createGlobalsAAWrapperPass();
 (void)llvm::createInstSimplifyLegacyPass();
 (void)llvm::createInstructionCombiningPass();
diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h 
b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
index 13bc4700d87029..96b5b815132bc0 100644
--- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h
+++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
@@ -74,6 +74,7 @@
 #include "llvm/Target/CGPassBuilderOption.h"
 #include "llvm/Target/TargetMachine.h"
 #include "llvm/Transforms/CFGuard.h"
+#include "llvm/Transforms/IPO/GlobalMergeFunctions.h"
 #include "llvm/Transforms/Scalar/ConstantHoisting.h"
 #include "llvm/Transforms/Scalar/LoopPassManager.h"
 #include "llvm/Transforms/Scalar/LoopStrengthReduce.h"
diff --git a/llvm/include/llvm/Transforms/IPO.h 
b/llvm/include/llvm/Transforms/IPO.h
index ee0e35aa618325..86a8654f56997c 100644
--- a/llvm/include/llvm/Transforms/IPO.h
+++ b/llvm/include/llvm/Transforms/IPO.h
@@ -55,6 +55,8 @@ enum class PassSummaryAction {
   Export, ///< Export information to summary.
 };
 
+Pass *createGlobalMergeF

[llvm-branch-commits] [llvm] MachineUniformityAnalysis: Improve isConstantOrUndefValuePhi (PR #112866)

2024-10-18 Thread Matt Arsenault via llvm-branch-commits



@@ -54,9 +54,28 @@ const MachineBasicBlock 
*MachineSSAContext::getDefBlock(Register value) const {
   return F->getRegInfo().getVRegDef(value)->getParent();
 }
 
+static bool isUndef(const MachineInstr &MI) {
+  return MI.getOpcode() == TargetOpcode::G_IMPLICIT_DEF ||
+ MI.getOpcode() == TargetOpcode::IMPLICIT_DEF;
+}
+
+/// MachineInstr equivalent of PHINode::hasConstantOrUndefValue()
 template <>
-bool MachineSSAContext::isConstantOrUndefValuePhi(const MachineInstr &Phi) {
-  return Phi.isConstantValuePHI();
+bool MachineSSAContext::isConstantOrUndefValuePhi(const MachineInstr &MI) {
+  if (!MI.isPHI())
+return false;
+  const MachineRegisterInfo &MRI = MI.getMF()->getRegInfo();
+  Register This = MI.getOperand(0).getReg();
+  Register ConstantValue;
+  for (unsigned i = 1, e = MI.getNumOperands(); i < e; i += 2) {
+Register Incoming = MI.getOperand(i).getReg();
+if (Incoming != This && !isUndef(*MRI.getVRegDef(Incoming))) {

arsenm wrote:

While we should eventually disallow undef operands in all SSA MIR, the current 
verifier rules effectively only guarantee this for G_PHI. Regular PHI may 
appear at some point with an undef operand, and getVRegDef can fail 

https://github.com/llvm/llvm-project/pull/112866
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [lld] [PAC][lld] Do not emit warnings for `-z pac-plt` with valid PAuth core info (PR #112959)

2024-10-18 Thread Daniel Kiss via llvm-branch-commits



@@ -33,13 +33,53 @@
 # RUN: llvm-mc -filetype=obj -triple=aarch64-linux-gnu no-info.s -o noinfo1.o
 # RUN: cp noinfo1.o noinfo2.o
 # RUN: not ld.lld -z pauth-report=error noinfo1.o tag1.o noinfo2.o -o 
/dev/null 2>&1 | FileCheck --check-prefix ERR5 %s
-# RUN: ld.lld -z pauth-report=warning noinfo1.o tag1.o noinfo2.o -o /dev/null 
2>&1 | FileCheck --check-prefix WARN %s
+# RUN: ld.lld -z pauth-report=warning noinfo1.o tag1.o noinfo2.o -o /dev/null 
2>&1 | FileCheck --check-prefix WARN1 %s
 # RUN: ld.lld -z pauth-report=none noinfo1.o tag1.o noinfo2.o --fatal-warnings 
-o /dev/null
 
 # ERR5:  error: noinfo1.o: -z pauth-report: file does not have AArch64 
PAuth core info while 'tag1.o' has one
 # ERR5-NEXT: error: noinfo2.o: -z pauth-report: file does not have AArch64 
PAuth core info while 'tag1.o' has one
-# WARN:  warning: noinfo1.o: -z pauth-report: file does not have AArch64 
PAuth core info while 'tag1.o' has one
-# WARN-NEXT: warning: noinfo2.o: -z pauth-report: file does not have AArch64 
PAuth core info while 'tag1.o' has one
+# WARN1:  warning: noinfo1.o: -z pauth-report: file does not have AArch64 
PAuth core info while 'tag1.o' has one
+# WARN1-NEXT: warning: noinfo2.o: -z pauth-report: file does not have AArch64 
PAuth core info while 'tag1.o' has one
+
+# RUN: llvm-mc -filetype=obj -triple=aarch64-linux-gnu abi-tag-zero.s  
  -o tag-zero.o
+# RUN: llvm-mc -filetype=obj -triple=aarch64-linux-gnu 
%p/Inputs/aarch64-func2.s -o func2.o
+# RUN: llvm-mc -filetype=obj -triple=aarch64-linux-gnu 
%p/Inputs/aarch64-func3.s -o func3.o
+# RUN: ld.lld func3.o --shared -o func3.so
+# RUN: ld.lld tag1.o func2.o func3.so -z pac-plt --shared -o pacplt-nowarn 
--fatal-warnings
+# RUN: ld.lld tag-zero.o func2.o func3.so -z pac-plt --shared -o pacplt-warn 
2>&1 | FileCheck --check-prefix WARN2 %s
+
+# WARN2:  warning: tag-zero.o: -z pac-plt: file does not have 
GNU_PROPERTY_AARCH64_FEATURE_1_PAC property and no valid PAuth core info 
present for this link job
+# WARN2-NEXT: warning: func2.o: -z pac-plt: file does not have 
GNU_PROPERTY_AARCH64_FEATURE_1_PAC property and no valid PAuth core info 
present for this link job

DanielKristofKiss wrote:

why this warning is not present for linking `pacplt-nowarn` with 
--fatal-warnings as func2.o is the same for both link command?


https://github.com/llvm/llvm-project/pull/112959
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [MC] Use StringRefs from pseudo_probe_desc section if it's mapped (PR #112996)

2024-10-18 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-mc

Author: Amir Ayupov (aaupov)


Changes

Add `IsMMapped` flag to `buildGUID2FuncDescMap` controlling whether to
allocate a string in `FuncNameAllocator` or use StringRef directly.

This saves ~0.7 GiB peak RSS in perf2bolt processing a medium sized
binary with ~0.8 GiB .pseudo_probe_desc section.

This is because BOLT keeps file sections in memory while processing them
whereas llvm-profgen constructs GUID2FuncDescMap and then releases the
binary.

Test Plan: no-op for llvm-profgen, NFC for perf2bolt


---
Full diff: https://github.com/llvm/llvm-project/pull/112996.diff


3 Files Affected:

- (modified) bolt/lib/Rewrite/PseudoProbeRewriter.cpp (+1-1) 
- (modified) llvm/include/llvm/MC/MCPseudoProbe.h (+4-1) 
- (modified) llvm/lib/MC/MCPseudoProbe.cpp (+4-2) 


``diff
diff --git a/bolt/lib/Rewrite/PseudoProbeRewriter.cpp 
b/bolt/lib/Rewrite/PseudoProbeRewriter.cpp
index 8647df4b0edf82..4fecfe8c3c09b1 100644
--- a/bolt/lib/Rewrite/PseudoProbeRewriter.cpp
+++ b/bolt/lib/Rewrite/PseudoProbeRewriter.cpp
@@ -128,7 +128,7 @@ void PseudoProbeRewriter::parsePseudoProbe(bool 
ProfiledOnly) {
   StringRef Contents = PseudoProbeDescSection->getContents();
   if (!ProbeDecoder.buildGUID2FuncDescMap(
   reinterpret_cast(Contents.data()),
-  Contents.size())) {
+  Contents.size(), /*IsMMapped*/true)) {
 errs() << "BOLT-WARNING: fail in building GUID2FuncDescMap\n";
 return;
   }
diff --git a/llvm/include/llvm/MC/MCPseudoProbe.h 
b/llvm/include/llvm/MC/MCPseudoProbe.h
index 4bfae9eba1a0aa..fd1f0557895446 100644
--- a/llvm/include/llvm/MC/MCPseudoProbe.h
+++ b/llvm/include/llvm/MC/MCPseudoProbe.h
@@ -431,7 +431,10 @@ class MCPseudoProbeDecoder {
   using Uint64Map = DenseMap;
 
   // Decode pseudo_probe_desc section to build GUID to PseudoProbeFuncDesc map.
-  bool buildGUID2FuncDescMap(const uint8_t *Start, std::size_t Size);
+  // If pseudo_probe_desc section is mapped to memory and \p IsMMapped is true,
+  // uses StringRefs pointing to the section.
+  bool buildGUID2FuncDescMap(const uint8_t *Start, std::size_t Size,
+ bool IsMMapped = false);
 
   // Decode pseudo_probe section to count the number of probes and inlined
   // function records for each function record.
diff --git a/llvm/lib/MC/MCPseudoProbe.cpp b/llvm/lib/MC/MCPseudoProbe.cpp
index 90d7588407068a..2a3761b2cfe718 100644
--- a/llvm/lib/MC/MCPseudoProbe.cpp
+++ b/llvm/lib/MC/MCPseudoProbe.cpp
@@ -375,7 +375,8 @@ ErrorOr 
MCPseudoProbeDecoder::readString(uint32_t Size) {
 }
 
 bool MCPseudoProbeDecoder::buildGUID2FuncDescMap(const uint8_t *Start,
- std::size_t Size) {
+ std::size_t Size,
+ bool IsMMapped) {
   // The pseudo_probe_desc section has a format like:
   // .section .pseudo_probe_desc,"",@progbits
   // .quad -5182264717993193164   // GUID
@@ -422,7 +423,8 @@ bool MCPseudoProbeDecoder::buildGUID2FuncDescMap(const 
uint8_t *Start,
 StringRef Name = cantFail(errorOrToExpected(readString(NameSize)));
 
 // Initialize PseudoProbeFuncDesc and populate it into GUID2FuncDescMap
-GUID2FuncDescMap.emplace_back(GUID, Hash, Name.copy(FuncNameAllocator));
+GUID2FuncDescMap.emplace_back(
+GUID, Hash, IsMMapped ? Name : Name.copy(FuncNameAllocator));
   }
   assert(Data == End && "Have unprocessed data in pseudo_probe_desc section");
   assert(GUID2FuncDescMap.size() == FuncDescCount &&

``




https://github.com/llvm/llvm-project/pull/112996
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [Flang][OpenMP] Derived type explicit allocatable member mapping (PR #111192)

2024-10-18 Thread Akash Banerjee via llvm-branch-commits


https://github.com/TIFitis approved this pull request.

LGTM :)

https://github.com/llvm/llvm-project/pull/92
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Mark grid size loads with range metadata (PR #113019)

2024-10-18 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes

Only handles the v5 case.

---
Full diff: https://github.com/llvm/llvm-project/pull/113019.diff


3 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp (+29-4) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp (+1) 
- (added) llvm/test/CodeGen/AMDGPU/amdgpu-max-num-workgroups-load-annotate.ll 
(+124) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp
index 1bb5e794da7dd6..5fc0c36359b6f5 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp
@@ -23,6 +23,7 @@
 #include "llvm/IR/InstIterator.h"
 #include "llvm/IR/Instructions.h"
 #include "llvm/IR/IntrinsicsAMDGPU.h"
+#include "llvm/IR/MDBuilder.h"
 #include "llvm/IR/PatternMatch.h"
 #include "llvm/Pass.h"
 
@@ -83,6 +84,20 @@ Function *getBasePtrIntrinsic(Module &M, bool IsV5OrAbove) {
 
 } // end anonymous namespace
 
+static void annotateGridSizeLoadWithRangeMD(LoadInst *Load,
+uint32_t MaxNumGroups) {
+  if (MaxNumGroups == 0 || MaxNumGroups == 
std::numeric_limits::max())
+return;
+
+  if (!Load->getType()->isIntegerTy(32))
+return;
+
+  // TODO: If there is existing range metadata, preserve it if it is stricter.
+  MDBuilder MDB(Load->getContext());
+  MDNode *Range = MDB.createRange(APInt(32, 1), APInt(32, MaxNumGroups + 1));
+  Load->setMetadata(LLVMContext::MD_range, Range);
+}
+
 static bool processUse(CallInst *CI, bool IsV5OrAbove) {
   Function *F = CI->getParent()->getParent();
 
@@ -92,7 +107,11 @@ static bool processUse(CallInst *CI, bool IsV5OrAbove) {
   const bool HasUniformWorkGroupSize =
 F->getFnAttribute("uniform-work-group-size").getValueAsBool();
 
-  if (!HasReqdWorkGroupSize && !HasUniformWorkGroupSize)
+  SmallVector MaxNumWorkgroups =
+  AMDGPU::getIntegerVecAttribute(*F, "amdgpu-max-num-workgroups", 3);
+
+  if (!HasReqdWorkGroupSize && !HasUniformWorkGroupSize &&
+  none_of(MaxNumWorkgroups, [](unsigned X) { return X != 0; }))
 return false;
 
   Value *BlockCounts[3] = {nullptr, nullptr, nullptr};
@@ -133,16 +152,22 @@ static bool processUse(CallInst *CI, bool IsV5OrAbove) {
 if (IsV5OrAbove) { // Base is ImplicitArgPtr.
   switch (Offset) {
   case HIDDEN_BLOCK_COUNT_X:
-if (LoadSize == 4)
+if (LoadSize == 4) {
   BlockCounts[0] = Load;
+  annotateGridSizeLoadWithRangeMD(Load, MaxNumWorkgroups[0]);
+}
 break;
   case HIDDEN_BLOCK_COUNT_Y:
-if (LoadSize == 4)
+if (LoadSize == 4) {
   BlockCounts[1] = Load;
+  annotateGridSizeLoadWithRangeMD(Load, MaxNumWorkgroups[1]);
+}
 break;
   case HIDDEN_BLOCK_COUNT_Z:
-if (LoadSize == 4)
+if (LoadSize == 4) {
   BlockCounts[2] = Load;
+  annotateGridSizeLoadWithRangeMD(Load, MaxNumWorkgroups[2]);
+}
 break;
   case HIDDEN_GROUP_SIZE_X:
 if (LoadSize == 2)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
index 961a9220b48d6b..5a899b755419ae 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
@@ -369,6 +369,7 @@ const AMDGPUSubtarget &AMDGPUSubtarget::get(const 
TargetMachine &TM, const Funct
   TM.getSubtarget(F));
 }
 
+// FIXME: This has no reason to be in subtarget
 SmallVector
 AMDGPUSubtarget::getMaxNumWorkGroups(const Function &F) const {
   return AMDGPU::getIntegerVecAttribute(F, "amdgpu-max-num-workgroups", 3);
diff --git 
a/llvm/test/CodeGen/AMDGPU/amdgpu-max-num-workgroups-load-annotate.ll 
b/llvm/test/CodeGen/AMDGPU/amdgpu-max-num-workgroups-load-annotate.ll
new file mode 100644
index 00..9064292129928f
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-max-num-workgroups-load-annotate.ll
@@ -0,0 +1,124 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py 
UTC_ARGS: --check-globals all --version 5
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa 
-passes=amdgpu-lower-kernel-attributes %s | FileCheck %s
+
+define i32 @use_grid_size_x_max_num_workgroups() #0 {
+; CHECK-LABEL: define i32 @use_grid_size_x_max_num_workgroups(
+; CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:[[IMPLICITARG_PTR:%.*]] = call ptr addrspace(4) 
@llvm.amdgcn.implicitarg.ptr()
+; CHECK-NEXT:[[GRID_SIZE_X:%.*]] = load i32, ptr addrspace(4) 
[[IMPLICITARG_PTR]], align 4, !range [[RNG0:![0-9]+]]
+; CHECK-NEXT:ret i32 [[GRID_SIZE_X]]
+;
+  %implicitarg.ptr = call ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
+  %grid.size.x = load i32, ptr addrspace(4) %implicitarg.ptr, align 4
+  ret i32 %grid.size.x
+}
+
+define i32 @use_grid_size_x_max_num_workgroups_existing_nonzero_range() #0 {
+; CHECK-LABEL: define i32 
@use

[llvm-branch-commits] [llvm] AMDGPU: Mark grid size loads with range metadata (PR #113019)

2024-10-18 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/113019

Only handles the v5 case.

>From 297173fb6ae137fb1695457c8b722f2d6e4acf86 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Sat, 19 Oct 2024 02:18:45 +0400
Subject: [PATCH] AMDGPU: Mark grid size loads with range metadata

Only handles the v5 case.
---
 .../AMDGPU/AMDGPULowerKernelAttributes.cpp|  33 -
 llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp|   1 +
 ...amdgpu-max-num-workgroups-load-annotate.ll | 124 ++
 3 files changed, 154 insertions(+), 4 deletions(-)
 create mode 100644 
llvm/test/CodeGen/AMDGPU/amdgpu-max-num-workgroups-load-annotate.ll

diff --git a/llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp
index 1bb5e794da7dd6..5fc0c36359b6f5 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp
@@ -23,6 +23,7 @@
 #include "llvm/IR/InstIterator.h"
 #include "llvm/IR/Instructions.h"
 #include "llvm/IR/IntrinsicsAMDGPU.h"
+#include "llvm/IR/MDBuilder.h"
 #include "llvm/IR/PatternMatch.h"
 #include "llvm/Pass.h"
 
@@ -83,6 +84,20 @@ Function *getBasePtrIntrinsic(Module &M, bool IsV5OrAbove) {
 
 } // end anonymous namespace
 
+static void annotateGridSizeLoadWithRangeMD(LoadInst *Load,
+uint32_t MaxNumGroups) {
+  if (MaxNumGroups == 0 || MaxNumGroups == 
std::numeric_limits::max())
+return;
+
+  if (!Load->getType()->isIntegerTy(32))
+return;
+
+  // TODO: If there is existing range metadata, preserve it if it is stricter.
+  MDBuilder MDB(Load->getContext());
+  MDNode *Range = MDB.createRange(APInt(32, 1), APInt(32, MaxNumGroups + 1));
+  Load->setMetadata(LLVMContext::MD_range, Range);
+}
+
 static bool processUse(CallInst *CI, bool IsV5OrAbove) {
   Function *F = CI->getParent()->getParent();
 
@@ -92,7 +107,11 @@ static bool processUse(CallInst *CI, bool IsV5OrAbove) {
   const bool HasUniformWorkGroupSize =
 F->getFnAttribute("uniform-work-group-size").getValueAsBool();
 
-  if (!HasReqdWorkGroupSize && !HasUniformWorkGroupSize)
+  SmallVector MaxNumWorkgroups =
+  AMDGPU::getIntegerVecAttribute(*F, "amdgpu-max-num-workgroups", 3);
+
+  if (!HasReqdWorkGroupSize && !HasUniformWorkGroupSize &&
+  none_of(MaxNumWorkgroups, [](unsigned X) { return X != 0; }))
 return false;
 
   Value *BlockCounts[3] = {nullptr, nullptr, nullptr};
@@ -133,16 +152,22 @@ static bool processUse(CallInst *CI, bool IsV5OrAbove) {
 if (IsV5OrAbove) { // Base is ImplicitArgPtr.
   switch (Offset) {
   case HIDDEN_BLOCK_COUNT_X:
-if (LoadSize == 4)
+if (LoadSize == 4) {
   BlockCounts[0] = Load;
+  annotateGridSizeLoadWithRangeMD(Load, MaxNumWorkgroups[0]);
+}
 break;
   case HIDDEN_BLOCK_COUNT_Y:
-if (LoadSize == 4)
+if (LoadSize == 4) {
   BlockCounts[1] = Load;
+  annotateGridSizeLoadWithRangeMD(Load, MaxNumWorkgroups[1]);
+}
 break;
   case HIDDEN_BLOCK_COUNT_Z:
-if (LoadSize == 4)
+if (LoadSize == 4) {
   BlockCounts[2] = Load;
+  annotateGridSizeLoadWithRangeMD(Load, MaxNumWorkgroups[2]);
+}
 break;
   case HIDDEN_GROUP_SIZE_X:
 if (LoadSize == 2)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
index 961a9220b48d6b..5a899b755419ae 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
@@ -369,6 +369,7 @@ const AMDGPUSubtarget &AMDGPUSubtarget::get(const 
TargetMachine &TM, const Funct
   TM.getSubtarget(F));
 }
 
+// FIXME: This has no reason to be in subtarget
 SmallVector
 AMDGPUSubtarget::getMaxNumWorkGroups(const Function &F) const {
   return AMDGPU::getIntegerVecAttribute(F, "amdgpu-max-num-workgroups", 3);
diff --git 
a/llvm/test/CodeGen/AMDGPU/amdgpu-max-num-workgroups-load-annotate.ll 
b/llvm/test/CodeGen/AMDGPU/amdgpu-max-num-workgroups-load-annotate.ll
new file mode 100644
index 00..9064292129928f
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-max-num-workgroups-load-annotate.ll
@@ -0,0 +1,124 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py 
UTC_ARGS: --check-globals all --version 5
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa 
-passes=amdgpu-lower-kernel-attributes %s | FileCheck %s
+
+define i32 @use_grid_size_x_max_num_workgroups() #0 {
+; CHECK-LABEL: define i32 @use_grid_size_x_max_num_workgroups(
+; CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:[[IMPLICITARG_PTR:%.*]] = call ptr addrspace(4) 
@llvm.amdgcn.implicitarg.ptr()
+; CHECK-NEXT:[[GRID_SIZE_X:%.*]] = load i32, ptr addrspace(4) 
[[IMPLICITARG_PTR]], align 4, !range [[RNG0:![0-9]+]]
+; CHECK-NEXT:ret i32 [[GRID_SIZE_X]]
+;
+  %implicitarg.ptr = call ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()

[llvm-branch-commits] [llvm] AMDGPU: Mark grid size loads with range metadata (PR #113019)

2024-10-18 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/113019
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Mark grid size loads with range metadata (PR #113019)

2024-10-18 Thread Matt Arsenault via llvm-branch-commits


arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/113019?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#113019** https://app.graphite.dev/github/pr/llvm/llvm-project/113019?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈
* **#113018** https://app.graphite.dev/github/pr/llvm/llvm-project/113018?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about 
stacking.


 Join @arsenm and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/113019
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: RBLegalize rules for load (PR #112882)

2024-10-18 Thread Petar Avramovic via llvm-branch-commits


https://github.com/petar-avramovic ready_for_review 
https://github.com/llvm/llvm-project/pull/112882
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: RBLegalize rules for load (PR #112865)

2024-10-18 Thread Petar Avramovic via llvm-branch-commits


https://github.com/petar-avramovic closed 
https://github.com/llvm/llvm-project/pull/112865
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: RBLegalize rules for load (PR #112882)

2024-10-18 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Petar Avramovic (petar-avramovic)


Changes

Add IDs for bit width that cover multiple LLTs: B32 B64 etc.
"Predicate" wrapper class for bool predicate functions used to
write pretty rules. Predicates can be combined using &&, || and !.
Lowering for splitting and widening loads.
Write rules for loads to not change existing mir tests from old
regbankselect.

---

Patch is 81.23 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/112882.diff


6 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/AMDGPURBLegalizeHelper.cpp (+297-5) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPURBLegalizeHelper.h (+4-3) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPURBLegalizeRules.cpp (+300-7) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPURBLegalizeRules.h (+63-2) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-load.mir 
(+271-49) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-zextload.mir 
(+7-2) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURBLegalizeHelper.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURBLegalizeHelper.cpp
index a0f6ecedab7a83..f58f0a315096d2 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURBLegalizeHelper.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURBLegalizeHelper.cpp
@@ -37,6 +37,97 @@ bool 
RegBankLegalizeHelper::findRuleAndApplyMapping(MachineInstr &MI) {
   return true;
 }
 
+void RegBankLegalizeHelper::splitLoad(MachineInstr &MI,
+  ArrayRef LLTBreakdown, LLT MergeTy) 
{
+  MachineFunction &MF = B.getMF();
+  assert(MI.getNumMemOperands() == 1);
+  MachineMemOperand &BaseMMO = **MI.memoperands_begin();
+  Register Dst = MI.getOperand(0).getReg();
+  const RegisterBank *DstRB = MRI.getRegBankOrNull(Dst);
+  Register BasePtrReg = MI.getOperand(1).getReg();
+  LLT PtrTy = MRI.getType(BasePtrReg);
+  const RegisterBank *PtrRB = MRI.getRegBankOrNull(BasePtrReg);
+  LLT OffsetTy = LLT::scalar(PtrTy.getSizeInBits());
+  SmallVector LoadPartRegs;
+
+  unsigned ByteOffset = 0;
+  for (LLT PartTy : LLTBreakdown) {
+Register BasePtrPlusOffsetReg;
+if (ByteOffset == 0) {
+  BasePtrPlusOffsetReg = BasePtrReg;
+} else {
+  BasePtrPlusOffsetReg = MRI.createVirtualRegister({PtrRB, PtrTy});
+  Register OffsetReg = MRI.createVirtualRegister({PtrRB, OffsetTy});
+  B.buildConstant(OffsetReg, ByteOffset);
+  B.buildPtrAdd(BasePtrPlusOffsetReg, BasePtrReg, OffsetReg);
+}
+MachineMemOperand *BasePtrPlusOffsetMMO =
+MF.getMachineMemOperand(&BaseMMO, ByteOffset, PartTy);
+Register PartLoad = MRI.createVirtualRegister({DstRB, PartTy});
+B.buildLoad(PartLoad, BasePtrPlusOffsetReg, *BasePtrPlusOffsetMMO);
+LoadPartRegs.push_back(PartLoad);
+ByteOffset += PartTy.getSizeInBytes();
+  }
+
+  if (!MergeTy.isValid()) {
+// Loads are of same size, concat or merge them together.
+B.buildMergeLikeInstr(Dst, LoadPartRegs);
+  } else {
+// Load(s) are not all of same size, need to unmerge them to smaller pieces
+// of MergeTy type, then merge them all together in Dst.
+SmallVector MergeTyParts;
+for (Register Reg : LoadPartRegs) {
+  if (MRI.getType(Reg) == MergeTy) {
+MergeTyParts.push_back(Reg);
+  } else {
+auto Unmerge = B.buildUnmerge(MergeTy, Reg);
+for (unsigned i = 0; i < Unmerge->getNumOperands() - 1; ++i) {
+  Register UnmergeReg = Unmerge->getOperand(i).getReg();
+  MRI.setRegBank(UnmergeReg, *DstRB);
+  MergeTyParts.push_back(UnmergeReg);
+}
+  }
+}
+B.buildMergeLikeInstr(Dst, MergeTyParts);
+  }
+  MI.eraseFromParent();
+}
+
+void RegBankLegalizeHelper::widenLoad(MachineInstr &MI, LLT WideTy,
+  LLT MergeTy) {
+  MachineFunction &MF = B.getMF();
+  assert(MI.getNumMemOperands() == 1);
+  MachineMemOperand &BaseMMO = **MI.memoperands_begin();
+  Register Dst = MI.getOperand(0).getReg();
+  const RegisterBank *DstRB = MRI.getRegBankOrNull(Dst);
+  Register BasePtrReg = MI.getOperand(1).getReg();
+
+  Register BasePtrPlusOffsetReg;
+  BasePtrPlusOffsetReg = BasePtrReg;
+
+  MachineMemOperand *BasePtrPlusOffsetMMO =
+  MF.getMachineMemOperand(&BaseMMO, 0, WideTy);
+  Register WideLoad = MRI.createVirtualRegister({DstRB, WideTy});
+  B.buildLoad(WideLoad, BasePtrPlusOffsetReg, *BasePtrPlusOffsetMMO);
+
+  if (WideTy.isScalar()) {
+B.buildTrunc(Dst, WideLoad);
+  } else {
+SmallVector MergeTyParts;
+unsigned NumEltsMerge =
+MRI.getType(Dst).getSizeInBits() / MergeTy.getSizeInBits();
+auto Unmerge = B.buildUnmerge(MergeTy, WideLoad);
+for (unsigned i = 0; i < Unmerge->getNumOperands() - 1; ++i) {
+  Register UnmergeReg = Unmerge->getOperand(i).getReg();
+  MRI.setRegBank(UnmergeReg, *DstRB);
+  if (i < NumEltsMerge)
+MergeTyParts.push_back(UnmergeReg);
+}
+B.buildMergeLikeInstr(Dst, MergeTyParts);
+  }
+  MI.eraseFromPare

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: RBLegalize rules for load (PR #112865)

2024-10-18 Thread Petar Avramovic via llvm-branch-commits


petar-avramovic wrote:

Ignore this, see https://github.com/llvm/llvm-project/pull/112882

https://github.com/llvm/llvm-project/pull/112865
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] MachineUniformityAnalysis: Improve isConstantOrUndefValuePhi (PR #112866)

2024-10-18 Thread Jay Foad via llvm-branch-commits


https://github.com/jayfoad approved this pull request.

> Change existing code to match what LLVM-IR version is doing

Yeah, looks reasonable to me.

https://github.com/llvm/llvm-project/pull/112866
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] release/19.x: [clang-format] Handle template opener/closer in braced list (#112494) (PR #112815)

2024-10-18 Thread via llvm-branch-commits


https://github.com/mydeveloperday approved this pull request.


https://github.com/llvm/llvm-project/pull/112815
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [Serialization] Code cleanups and polish 83233 (PR #83237)

2024-10-18 Thread Ilya Biryukov via llvm-branch-commits


ilya-biryukov wrote:

We've hit only one correctness issue that we don't yet have a reproducer for, 
but it gives this error:
```shell
Eigen/.../plugins/CommonCwiseBinaryOps.inc:47:29: error: inline function 
'Eigen::operator*' is not defined [-Werror,-Wundefined-inline]
```
I'll be on vacation for the next two weeks, @usx95 may be able to provide a 
reproducer in the meanwhile.
Maybe you'll have ideas on why it's happening without the reproducer too.

More importantly, we have seen quite a few compile actions becoming much slower 
and causing timeouts.
@alexfh is trying to profile the compile actions to find which functions are 
the bottleneck and will report the results here.

https://github.com/llvm/llvm-project/pull/83237
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [lld] [PAC][lld] Do not emit warnings for `-z pac-plt` with valid PAuth core info (PR #112959)

2024-10-18 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-lld-elf

Author: Daniil Kovalev (kovdan01)


Changes

When PAuth core info is present and (platform,version) is not (0,0),
treat input files as pac-enabled and do not emit a warning with
`-z pac-plt` passed.

---
Full diff: https://github.com/llvm/llvm-project/pull/112959.diff


3 Files Affected:

- (modified) lld/ELF/Driver.cpp (+8-2) 
- (modified) lld/test/ELF/aarch64-feature-pac.s (+1-1) 
- (modified) lld/test/ELF/aarch64-feature-pauth.s (+55-3) 


``diff
diff --git a/lld/ELF/Driver.cpp b/lld/ELF/Driver.cpp
index fb77e67e9fc5ca..c436be6b24e001 100644
--- a/lld/ELF/Driver.cpp
+++ b/lld/ELF/Driver.cpp
@@ -2753,6 +2753,10 @@ static void readSecurityNotes(Ctx &ctx) {
   referenceFileName = (*it)->getName();
 }
   }
+  bool hasValidPauthAbiCoreInfo =
+  (!ctx.aarch64PauthAbiCoreInfo.empty() &&
+   llvm::any_of(ctx.aarch64PauthAbiCoreInfo,
+[](uint8_t c) { return c != 0; }));
 
   for (ELFFileBase *f : ctx.objectFiles) {
 uint32_t features = f->andFeatures;
@@ -2789,9 +2793,11 @@ static void readSecurityNotes(Ctx &ctx) {
"GNU_PROPERTY_X86_FEATURE_1_IBT property");
   features |= GNU_PROPERTY_X86_FEATURE_1_IBT;
 }
-if (ctx.arg.zPacPlt && !(features & GNU_PROPERTY_AARCH64_FEATURE_1_PAC)) {
+if (ctx.arg.zPacPlt && !(hasValidPauthAbiCoreInfo ||
+ (features & GNU_PROPERTY_AARCH64_FEATURE_1_PAC))) 
{
   warn(toString(f) + ": -z pac-plt: file does not have "
- "GNU_PROPERTY_AARCH64_FEATURE_1_PAC property");
+ "GNU_PROPERTY_AARCH64_FEATURE_1_PAC property and no "
+ "valid PAuth core info present for this link job");
   features |= GNU_PROPERTY_AARCH64_FEATURE_1_PAC;
 }
 ctx.arg.andFeatures &= features;
diff --git a/lld/test/ELF/aarch64-feature-pac.s 
b/lld/test/ELF/aarch64-feature-pac.s
index b85a33216cb5bd..4fd1fd2acea737 100644
--- a/lld/test/ELF/aarch64-feature-pac.s
+++ b/lld/test/ELF/aarch64-feature-pac.s
@@ -82,7 +82,7 @@
 
 # RUN: ld.lld %t.o %t2.o -z pac-plt %t.so -o %tpacplt.exe 2>&1 | FileCheck 
-DFILE=%t2.o --check-prefix WARN %s
 
-# WARN: warning: [[FILE]]: -z pac-plt: file does not have 
GNU_PROPERTY_AARCH64_FEATURE_1_PAC property
+# WARN: warning: [[FILE]]: -z pac-plt: file does not have 
GNU_PROPERTY_AARCH64_FEATURE_1_PAC property and no valid PAuth core info 
present for this link job
 
 # RUN: llvm-readelf -n %tpacplt.exe | FileCheck --check-prefix=PACPROP %s
 # RUN: llvm-readelf --dynamic-table %tpacplt.exe | FileCheck --check-prefix 
PACDYN2 %s
diff --git a/lld/test/ELF/aarch64-feature-pauth.s 
b/lld/test/ELF/aarch64-feature-pauth.s
index 699a650d72295a..c11073dba86f24 100644
--- a/lld/test/ELF/aarch64-feature-pauth.s
+++ b/lld/test/ELF/aarch64-feature-pauth.s
@@ -33,13 +33,53 @@
 # RUN: llvm-mc -filetype=obj -triple=aarch64-linux-gnu no-info.s -o noinfo1.o
 # RUN: cp noinfo1.o noinfo2.o
 # RUN: not ld.lld -z pauth-report=error noinfo1.o tag1.o noinfo2.o -o 
/dev/null 2>&1 | FileCheck --check-prefix ERR5 %s
-# RUN: ld.lld -z pauth-report=warning noinfo1.o tag1.o noinfo2.o -o /dev/null 
2>&1 | FileCheck --check-prefix WARN %s
+# RUN: ld.lld -z pauth-report=warning noinfo1.o tag1.o noinfo2.o -o /dev/null 
2>&1 | FileCheck --check-prefix WARN1 %s
 # RUN: ld.lld -z pauth-report=none noinfo1.o tag1.o noinfo2.o --fatal-warnings 
-o /dev/null
 
 # ERR5:  error: noinfo1.o: -z pauth-report: file does not have AArch64 
PAuth core info while 'tag1.o' has one
 # ERR5-NEXT: error: noinfo2.o: -z pauth-report: file does not have AArch64 
PAuth core info while 'tag1.o' has one
-# WARN:  warning: noinfo1.o: -z pauth-report: file does not have AArch64 
PAuth core info while 'tag1.o' has one
-# WARN-NEXT: warning: noinfo2.o: -z pauth-report: file does not have AArch64 
PAuth core info while 'tag1.o' has one
+# WARN1:  warning: noinfo1.o: -z pauth-report: file does not have AArch64 
PAuth core info while 'tag1.o' has one
+# WARN1-NEXT: warning: noinfo2.o: -z pauth-report: file does not have AArch64 
PAuth core info while 'tag1.o' has one
+
+# RUN: llvm-mc -filetype=obj -triple=aarch64-linux-gnu abi-tag-zero.s  
  -o tag-zero.o
+# RUN: llvm-mc -filetype=obj -triple=aarch64-linux-gnu 
%p/Inputs/aarch64-func2.s -o func2.o
+# RUN: llvm-mc -filetype=obj -triple=aarch64-linux-gnu 
%p/Inputs/aarch64-func3.s -o func3.o
+# RUN: ld.lld func3.o --shared -o func3.so
+# RUN: ld.lld tag1.o func2.o func3.so -z pac-plt --shared -o pacplt-nowarn 
--fatal-warnings
+# RUN: ld.lld tag-zero.o func2.o func3.so -z pac-plt --shared -o pacplt-warn 
2>&1 | FileCheck --check-prefix WARN2 %s
+
+# WARN2:  warning: tag-zero.o: -z pac-plt: file does not have 
GNU_PROPERTY_AARCH64_FEATURE_1_PAC property and no valid PAuth core info 
present for this link job
+# WARN2-NEXT: warning: func2.o: -z pac-plt: file does not have 
GNU_PROPERTY_AARCH64_FEATURE_1_PAC property and no vali

[llvm-branch-commits] [flang] [flang][cuda] Translate cuf.register_kernel and cuf.register_module (PR #112972)

2024-10-18 Thread via llvm-branch-commits


llvmbot wrote:



@llvm/pr-subscribers-flang-fir-hlfir

@llvm/pr-subscribers-flang-runtime

Author: Valentin Clement (バレンタイン クレメン) (clementval)


Changes

Add LLVM IR Translation for `cuf.register_module` and `cuf.register_kernel`. 
These are lowered to function call to the CUF runtime entries. 

---
Full diff: https://github.com/llvm/llvm-project/pull/112972.diff


8 Files Affected:

- (added) flang/include/flang/Optimizer/Dialect/CUF/CUFToLLVMIRTranslation.h 
(+29) 
- (modified) flang/include/flang/Optimizer/Support/InitFIR.h (+2) 
- (added) flang/include/flang/Runtime/CUDA/registration.h (+28) 
- (modified) flang/lib/Optimizer/Dialect/CUF/CMakeLists.txt (+1) 
- (added) flang/lib/Optimizer/Dialect/CUF/CUFToLLVMIRTranslation.cpp (+104) 
- (modified) flang/lib/Optimizer/Transforms/CufOpConversion.cpp (+1) 
- (modified) flang/runtime/CUDA/CMakeLists.txt (+1) 
- (added) flang/runtime/CUDA/registration.cpp (+31) 


``diff
diff --git a/flang/include/flang/Optimizer/Dialect/CUF/CUFToLLVMIRTranslation.h 
b/flang/include/flang/Optimizer/Dialect/CUF/CUFToLLVMIRTranslation.h
new file mode 100644
index 00..f3edb7fca649d0
--- /dev/null
+++ b/flang/include/flang/Optimizer/Dialect/CUF/CUFToLLVMIRTranslation.h
@@ -0,0 +1,29 @@
+//===- CUFToLLVMIRTranslation.h - CUF Dialect to LLVM IR *- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This provides registration calls for GPU dialect to LLVM IR translation.
+//
+//===--===//
+
+#ifndef FLANG_OPTIMIZER_DIALECT_CUF_GPUTOLLVMIRTRANSLATION_H_
+#define FLANG_OPTIMIZER_DIALECT_CUF_GPUTOLLVMIRTRANSLATION_H_
+
+namespace mlir {
+class DialectRegistry;
+class MLIRContext;
+} // namespace mlir
+
+namespace cuf {
+
+/// Register the CUF dialect and the translation from it to the LLVM IR in
+/// the given registry.
+void registerCUFDialectTranslation(mlir::DialectRegistry ®istry);
+
+} // namespace cuf
+
+#endif // FLANG_OPTIMIZER_DIALECT_CUF_GPUTOLLVMIRTRANSLATION_H_
diff --git a/flang/include/flang/Optimizer/Support/InitFIR.h 
b/flang/include/flang/Optimizer/Support/InitFIR.h
index 04a5dd323e5508..1c61c367199923 100644
--- a/flang/include/flang/Optimizer/Support/InitFIR.h
+++ b/flang/include/flang/Optimizer/Support/InitFIR.h
@@ -14,6 +14,7 @@
 #define FORTRAN_OPTIMIZER_SUPPORT_INITFIR_H
 
 #include "flang/Optimizer/Dialect/CUF/CUFDialect.h"
+#include "flang/Optimizer/Dialect/CUF/CUFToLLVMIRTranslation.h"
 #include "flang/Optimizer/Dialect/FIRDialect.h"
 #include "flang/Optimizer/HLFIR/HLFIRDialect.h"
 #include "mlir/Conversion/Passes.h"
@@ -61,6 +62,7 @@ inline void addFIRExtensions(mlir::DialectRegistry ®istry,
   if (addFIRInlinerInterface)
 addFIRInlinerExtension(registry);
   addFIRToLLVMIRExtension(registry);
+  cuf::registerCUFDialectTranslation(registry);
 }
 
 inline void loadNonCodegenDialects(mlir::MLIRContext &context) {
diff --git a/flang/include/flang/Runtime/CUDA/registration.h 
b/flang/include/flang/Runtime/CUDA/registration.h
new file mode 100644
index 00..cbe202c4d23e0d
--- /dev/null
+++ b/flang/include/flang/Runtime/CUDA/registration.h
@@ -0,0 +1,28 @@
+//===-- include/flang/Runtime/CUDA/registration.h ---*- C -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef FORTRAN_RUNTIME_CUDA_REGISTRATION_H_
+#define FORTRAN_RUNTIME_CUDA_REGISTRATION_H_
+
+#include "flang/Runtime/entry-names.h"
+#include 
+
+namespace Fortran::runtime::cuda {
+
+extern "C" {
+
+/// Register a CUDA module.
+void *RTDECL(CUFRegisterModule)(void *data);
+
+/// Register a device function.
+void RTDECL(CUFRegisterFunction)(void **module, const char *fct);
+
+} // extern "C"
+
+} // namespace Fortran::runtime::cuda
+#endif // FORTRAN_RUNTIME_CUDA_REGISTRATION_H_
diff --git a/flang/lib/Optimizer/Dialect/CUF/CMakeLists.txt 
b/flang/lib/Optimizer/Dialect/CUF/CMakeLists.txt
index b222115d58..5d4bd0785971f7 100644
--- a/flang/lib/Optimizer/Dialect/CUF/CMakeLists.txt
+++ b/flang/lib/Optimizer/Dialect/CUF/CMakeLists.txt
@@ -3,6 +3,7 @@ add_subdirectory(Attributes)
 add_flang_library(CUFDialect
   CUFDialect.cpp
   CUFOps.cpp
+  CUFToLLVMIRTranslation.cpp
 
   DEPENDS
   MLIRIR
diff --git a/flang/lib/Optimizer/Dialect/CUF/CUFToLLVMIRTranslation.cpp 
b/flang/lib/Optimizer/Dialect/CUF/CUFToLLVMIRTranslation.cpp
new file mode 100644
index 00..c6c9f96b811352
--- /dev/null
+++ b/flang/lib/Optimizer/Dialect/CUF/CUFToLLVMIRTranslation.cpp
@@ -0,0 +1,10

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: RBSelect (PR #112863)

2024-10-18 Thread Diana Picus via llvm-branch-commits



@@ -26,6 +28,26 @@ std::pair
 getBaseWithConstantOffset(MachineRegisterInfo &MRI, Register Reg,
   GISelKnownBits *KnownBits = nullptr,
   bool CheckNUW = false);
+
+// Currently finds S32/S64 lane masks that can be declared as divergent by
+// uniformity analysis (all are phis at the moment).
+// These are defined as i32/i64 in some IR intrinsics (not as i1).
+// Tablegen forces(via telling that lane mask IR intrinsics are uniform) most 
of
+// S32/S64 lane masks to be uniform, as this results in them ending up with 
sgpr
+// reg class after instruction-select don't search for all of them.
+class IntrinsicLaneMaskAnalyzer {
+  DenseSet S32S64LaneMask;
+  MachineRegisterInfo &MRI;
+
+public:
+  IntrinsicLaneMaskAnalyzer(MachineFunction &MF);
+  bool isS32S64LaneMask(Register Reg);
+
+private:
+  void initLaneMaskIntrinsics(MachineFunction &MF);
+  // This will not be needed when we turn of LCSSA for global-isel.

rovka wrote:

```suggestion
  // This will not be needed when we turn off LCSSA for global-isel.
```

https://github.com/llvm/llvm-project/pull/112863
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: RBSelect (PR #112863)

2024-10-18 Thread Diana Picus via llvm-branch-commits



@@ -63,4 +70,189 @@ char &llvm::AMDGPURBSelectID = AMDGPURBSelect::ID;
 
 FunctionPass *llvm::createAMDGPURBSelectPass() { return new AMDGPURBSelect(); }
 
-bool AMDGPURBSelect::runOnMachineFunction(MachineFunction &MF) { return true; }
+bool shouldRBSelect(MachineInstr &MI) {
+  if (isTargetSpecificOpcode(MI.getOpcode()) && !MI.isPreISelOpcode())
+return false;
+
+  if (MI.getOpcode() == AMDGPU::PHI || MI.getOpcode() == AMDGPU::IMPLICIT_DEF)
+return false;
+
+  if (MI.isInlineAsm())
+return false;
+
+  return true;
+}
+
+void setRB(MachineInstr &MI, MachineOperand &DefOP, MachineIRBuilder B,
+   MachineRegisterInfo &MRI, const RegisterBank &RB) {
+  Register Reg = DefOP.getReg();
+  // Register that already has Register class got it during pre-inst selection
+  // of another instruction. Maybe cross bank copy was required so we insert a
+  // copy trat can be removed later. This simplifies post-rb-legalize artifact

rovka wrote:

```suggestion
  // copy that can be removed later. This simplifies post-rb-legalize artifact
```

https://github.com/llvm/llvm-project/pull/112863
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: RBSelect (PR #112863)

2024-10-18 Thread Diana Picus via llvm-branch-commits



@@ -63,4 +70,189 @@ char &llvm::AMDGPURBSelectID = AMDGPURBSelect::ID;
 
 FunctionPass *llvm::createAMDGPURBSelectPass() { return new AMDGPURBSelect(); }
 
-bool AMDGPURBSelect::runOnMachineFunction(MachineFunction &MF) { return true; }
+bool shouldRBSelect(MachineInstr &MI) {
+  if (isTargetSpecificOpcode(MI.getOpcode()) && !MI.isPreISelOpcode())
+return false;
+
+  if (MI.getOpcode() == AMDGPU::PHI || MI.getOpcode() == AMDGPU::IMPLICIT_DEF)
+return false;
+
+  if (MI.isInlineAsm())
+return false;
+
+  return true;
+}
+
+void setRB(MachineInstr &MI, MachineOperand &DefOP, MachineIRBuilder B,
+   MachineRegisterInfo &MRI, const RegisterBank &RB) {
+  Register Reg = DefOP.getReg();
+  // Register that already has Register class got it during pre-inst selection
+  // of another instruction. Maybe cross bank copy was required so we insert a
+  // copy trat can be removed later. This simplifies post-rb-legalize artifact
+  // combiner and avoids need to special case some patterns.
+  if (MRI.getRegClassOrNull(Reg)) {
+LLT Ty = MRI.getType(Reg);
+Register NewReg = MRI.createVirtualRegister({&RB, Ty});
+DefOP.setReg(NewReg);
+
+auto &MBB = *MI.getParent();
+B.setInsertPt(MBB, MI.isPHI() ? MBB.getFirstNonPHI()
+  : std::next(MI.getIterator()));
+B.buildCopy(Reg, NewReg);
+
+// The problem was discoverd for uniform S1 that was used as both
+// lane mask(vcc) and regular sgpr S1.
+// - lane-mask(vcc) use was by si_if, this use is divergent and requires
+//   non-trivial sgpr-S1-to-vcc copy. But pre-inst-selection of si_if sets
+//   sreg_64_xexec(S1) on def of uniform S1 making it lane-mask.
+// - the regular regular sgpr S1(uniform) instruction is now broken since
+//   it uses sreg_64_xexec(S1) which is divergent.
+
+// "Clear" reg classes from uses on generic instructions and but register

rovka wrote:

```suggestion
// "Clear" reg classes from uses on generic instructions and put register
```

https://github.com/llvm/llvm-project/pull/112863
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: RBSelect (PR #112863)

2024-10-18 Thread Diana Picus via llvm-branch-commits



@@ -63,4 +70,189 @@ char &llvm::AMDGPURBSelectID = AMDGPURBSelect::ID;
 
 FunctionPass *llvm::createAMDGPURBSelectPass() { return new AMDGPURBSelect(); }
 
-bool AMDGPURBSelect::runOnMachineFunction(MachineFunction &MF) { return true; }
+bool shouldRBSelect(MachineInstr &MI) {
+  if (isTargetSpecificOpcode(MI.getOpcode()) && !MI.isPreISelOpcode())
+return false;
+
+  if (MI.getOpcode() == AMDGPU::PHI || MI.getOpcode() == AMDGPU::IMPLICIT_DEF)
+return false;
+
+  if (MI.isInlineAsm())
+return false;
+
+  return true;
+}
+
+void setRB(MachineInstr &MI, MachineOperand &DefOP, MachineIRBuilder B,

rovka wrote:

```suggestion
void setRBDef(MachineInstr &MI, MachineOperand &DefOp, MachineIRBuilder B,
```

https://github.com/llvm/llvm-project/pull/112863
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: RBSelect (PR #112863)

2024-10-18 Thread Diana Picus via llvm-branch-commits



@@ -63,4 +70,189 @@ char &llvm::AMDGPURBSelectID = AMDGPURBSelect::ID;
 
 FunctionPass *llvm::createAMDGPURBSelectPass() { return new AMDGPURBSelect(); }
 
-bool AMDGPURBSelect::runOnMachineFunction(MachineFunction &MF) { return true; }
+bool shouldRBSelect(MachineInstr &MI) {
+  if (isTargetSpecificOpcode(MI.getOpcode()) && !MI.isPreISelOpcode())
+return false;
+
+  if (MI.getOpcode() == AMDGPU::PHI || MI.getOpcode() == AMDGPU::IMPLICIT_DEF)
+return false;
+
+  if (MI.isInlineAsm())
+return false;
+
+  return true;
+}
+
+void setRB(MachineInstr &MI, MachineOperand &DefOP, MachineIRBuilder B,
+   MachineRegisterInfo &MRI, const RegisterBank &RB) {
+  Register Reg = DefOP.getReg();
+  // Register that already has Register class got it during pre-inst selection
+  // of another instruction. Maybe cross bank copy was required so we insert a
+  // copy trat can be removed later. This simplifies post-rb-legalize artifact
+  // combiner and avoids need to special case some patterns.
+  if (MRI.getRegClassOrNull(Reg)) {
+LLT Ty = MRI.getType(Reg);
+Register NewReg = MRI.createVirtualRegister({&RB, Ty});
+DefOP.setReg(NewReg);
+
+auto &MBB = *MI.getParent();
+B.setInsertPt(MBB, MI.isPHI() ? MBB.getFirstNonPHI()
+  : std::next(MI.getIterator()));

rovka wrote:

Can you use skipPHIsAndLabels(std:next...) here too?

https://github.com/llvm/llvm-project/pull/112863
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: RBSelect (PR #112863)

2024-10-18 Thread Diana Picus via llvm-branch-commits



@@ -63,4 +70,189 @@ char &llvm::AMDGPURBSelectID = AMDGPURBSelect::ID;
 
 FunctionPass *llvm::createAMDGPURBSelectPass() { return new AMDGPURBSelect(); }
 
-bool AMDGPURBSelect::runOnMachineFunction(MachineFunction &MF) { return true; }
+bool shouldRBSelect(MachineInstr &MI) {
+  if (isTargetSpecificOpcode(MI.getOpcode()) && !MI.isPreISelOpcode())
+return false;
+
+  if (MI.getOpcode() == AMDGPU::PHI || MI.getOpcode() == AMDGPU::IMPLICIT_DEF)
+return false;
+
+  if (MI.isInlineAsm())
+return false;
+
+  return true;
+}
+
+void setRB(MachineInstr &MI, MachineOperand &DefOP, MachineIRBuilder B,
+   MachineRegisterInfo &MRI, const RegisterBank &RB) {
+  Register Reg = DefOP.getReg();
+  // Register that already has Register class got it during pre-inst selection
+  // of another instruction. Maybe cross bank copy was required so we insert a
+  // copy trat can be removed later. This simplifies post-rb-legalize artifact
+  // combiner and avoids need to special case some patterns.
+  if (MRI.getRegClassOrNull(Reg)) {
+LLT Ty = MRI.getType(Reg);
+Register NewReg = MRI.createVirtualRegister({&RB, Ty});
+DefOP.setReg(NewReg);
+
+auto &MBB = *MI.getParent();
+B.setInsertPt(MBB, MI.isPHI() ? MBB.getFirstNonPHI()
+  : std::next(MI.getIterator()));
+B.buildCopy(Reg, NewReg);
+
+// The problem was discoverd for uniform S1 that was used as both

rovka wrote:

```suggestion
// The problem was discovered for uniform S1 that was used as both
```

https://github.com/llvm/llvm-project/pull/112863
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: RBSelect (PR #112863)

2024-10-18 Thread Diana Picus via llvm-branch-commits



@@ -63,4 +70,189 @@ char &llvm::AMDGPURBSelectID = AMDGPURBSelect::ID;
 
 FunctionPass *llvm::createAMDGPURBSelectPass() { return new AMDGPURBSelect(); }
 
-bool AMDGPURBSelect::runOnMachineFunction(MachineFunction &MF) { return true; }
+bool shouldRBSelect(MachineInstr &MI) {
+  if (isTargetSpecificOpcode(MI.getOpcode()) && !MI.isPreISelOpcode())
+return false;
+
+  if (MI.getOpcode() == AMDGPU::PHI || MI.getOpcode() == AMDGPU::IMPLICIT_DEF)
+return false;
+
+  if (MI.isInlineAsm())
+return false;
+
+  return true;
+}
+
+void setRB(MachineInstr &MI, MachineOperand &DefOP, MachineIRBuilder B,
+   MachineRegisterInfo &MRI, const RegisterBank &RB) {
+  Register Reg = DefOP.getReg();
+  // Register that already has Register class got it during pre-inst selection
+  // of another instruction. Maybe cross bank copy was required so we insert a
+  // copy trat can be removed later. This simplifies post-rb-legalize artifact
+  // combiner and avoids need to special case some patterns.
+  if (MRI.getRegClassOrNull(Reg)) {
+LLT Ty = MRI.getType(Reg);
+Register NewReg = MRI.createVirtualRegister({&RB, Ty});
+DefOP.setReg(NewReg);
+
+auto &MBB = *MI.getParent();
+B.setInsertPt(MBB, MI.isPHI() ? MBB.getFirstNonPHI()
+  : std::next(MI.getIterator()));
+B.buildCopy(Reg, NewReg);
+
+// The problem was discoverd for uniform S1 that was used as both
+// lane mask(vcc) and regular sgpr S1.
+// - lane-mask(vcc) use was by si_if, this use is divergent and requires
+//   non-trivial sgpr-S1-to-vcc copy. But pre-inst-selection of si_if sets
+//   sreg_64_xexec(S1) on def of uniform S1 making it lane-mask.
+// - the regular regular sgpr S1(uniform) instruction is now broken since
+//   it uses sreg_64_xexec(S1) which is divergent.
+
+// "Clear" reg classes from uses on generic instructions and but register
+// banks instead.
+for (auto &UseMI : MRI.use_instructions(Reg)) {
+  if (shouldRBSelect(UseMI)) {
+for (MachineOperand &Op : UseMI.operands()) {
+  if (Op.isReg() && Op.isUse() && Op.getReg() == Reg)

rovka wrote:

Do we really need to check isUse?

https://github.com/llvm/llvm-project/pull/112863
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: RBSelect (PR #112863)

2024-10-18 Thread Diana Picus via llvm-branch-commits



@@ -63,4 +70,189 @@ char &llvm::AMDGPURBSelectID = AMDGPURBSelect::ID;
 
 FunctionPass *llvm::createAMDGPURBSelectPass() { return new AMDGPURBSelect(); }
 
-bool AMDGPURBSelect::runOnMachineFunction(MachineFunction &MF) { return true; }
+bool shouldRBSelect(MachineInstr &MI) {
+  if (isTargetSpecificOpcode(MI.getOpcode()) && !MI.isPreISelOpcode())
+return false;
+
+  if (MI.getOpcode() == AMDGPU::PHI || MI.getOpcode() == AMDGPU::IMPLICIT_DEF)
+return false;
+
+  if (MI.isInlineAsm())
+return false;
+
+  return true;
+}
+
+void setRB(MachineInstr &MI, MachineOperand &DefOP, MachineIRBuilder B,
+   MachineRegisterInfo &MRI, const RegisterBank &RB) {
+  Register Reg = DefOP.getReg();
+  // Register that already has Register class got it during pre-inst selection
+  // of another instruction. Maybe cross bank copy was required so we insert a
+  // copy trat can be removed later. This simplifies post-rb-legalize artifact
+  // combiner and avoids need to special case some patterns.
+  if (MRI.getRegClassOrNull(Reg)) {
+LLT Ty = MRI.getType(Reg);
+Register NewReg = MRI.createVirtualRegister({&RB, Ty});
+DefOP.setReg(NewReg);
+
+auto &MBB = *MI.getParent();
+B.setInsertPt(MBB, MI.isPHI() ? MBB.getFirstNonPHI()
+  : std::next(MI.getIterator()));
+B.buildCopy(Reg, NewReg);
+
+// The problem was discoverd for uniform S1 that was used as both
+// lane mask(vcc) and regular sgpr S1.
+// - lane-mask(vcc) use was by si_if, this use is divergent and requires
+//   non-trivial sgpr-S1-to-vcc copy. But pre-inst-selection of si_if sets
+//   sreg_64_xexec(S1) on def of uniform S1 making it lane-mask.
+// - the regular regular sgpr S1(uniform) instruction is now broken since

rovka wrote:

```suggestion
// - the regular sgpr S1(uniform) instruction is now broken since
```

https://github.com/llvm/llvm-project/pull/112863
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [flang][cuda] Translate cuf.register_kernel and cuf.register_module (PR #112972)

2024-10-18 Thread Valentin Clement バレンタインクレメン via llvm-branch-commits


https://github.com/clementval created 
https://github.com/llvm/llvm-project/pull/112972

Add LLVM IR Translation for `cuf.register_module` and `cuf.register_kernel`. 
These are lowered to function call to the CUF runtime entries. 

>From 49f9aa765e7ac39cf09e2ae12a656aa0c76b7f43 Mon Sep 17 00:00:00 2001
From: Valentin Clement 
Date: Thu, 17 Oct 2024 14:36:04 -0700
Subject: [PATCH] [flang][cuda] Translate cuf.register_kernel and
 cuf.register_module

---
 .../Dialect/CUF/CUFToLLVMIRTranslation.h  |  29 +
 .../include/flang/Optimizer/Support/InitFIR.h |   2 +
 .../include/flang/Runtime/CUDA/registration.h |  28 +
 .../lib/Optimizer/Dialect/CUF/CMakeLists.txt  |   1 +
 .../Dialect/CUF/CUFToLLVMIRTranslation.cpp| 104 ++
 .../Optimizer/Transforms/CufOpConversion.cpp  |   1 +
 flang/runtime/CUDA/CMakeLists.txt |   1 +
 flang/runtime/CUDA/registration.cpp   |  31 ++
 8 files changed, 197 insertions(+)
 create mode 100644 
flang/include/flang/Optimizer/Dialect/CUF/CUFToLLVMIRTranslation.h
 create mode 100644 flang/include/flang/Runtime/CUDA/registration.h
 create mode 100644 flang/lib/Optimizer/Dialect/CUF/CUFToLLVMIRTranslation.cpp
 create mode 100644 flang/runtime/CUDA/registration.cpp

diff --git a/flang/include/flang/Optimizer/Dialect/CUF/CUFToLLVMIRTranslation.h 
b/flang/include/flang/Optimizer/Dialect/CUF/CUFToLLVMIRTranslation.h
new file mode 100644
index 00..f3edb7fca649d0
--- /dev/null
+++ b/flang/include/flang/Optimizer/Dialect/CUF/CUFToLLVMIRTranslation.h
@@ -0,0 +1,29 @@
+//===- CUFToLLVMIRTranslation.h - CUF Dialect to LLVM IR *- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This provides registration calls for GPU dialect to LLVM IR translation.
+//
+//===--===//
+
+#ifndef FLANG_OPTIMIZER_DIALECT_CUF_GPUTOLLVMIRTRANSLATION_H_
+#define FLANG_OPTIMIZER_DIALECT_CUF_GPUTOLLVMIRTRANSLATION_H_
+
+namespace mlir {
+class DialectRegistry;
+class MLIRContext;
+} // namespace mlir
+
+namespace cuf {
+
+/// Register the CUF dialect and the translation from it to the LLVM IR in
+/// the given registry.
+void registerCUFDialectTranslation(mlir::DialectRegistry ®istry);
+
+} // namespace cuf
+
+#endif // FLANG_OPTIMIZER_DIALECT_CUF_GPUTOLLVMIRTRANSLATION_H_
diff --git a/flang/include/flang/Optimizer/Support/InitFIR.h 
b/flang/include/flang/Optimizer/Support/InitFIR.h
index 04a5dd323e5508..1c61c367199923 100644
--- a/flang/include/flang/Optimizer/Support/InitFIR.h
+++ b/flang/include/flang/Optimizer/Support/InitFIR.h
@@ -14,6 +14,7 @@
 #define FORTRAN_OPTIMIZER_SUPPORT_INITFIR_H
 
 #include "flang/Optimizer/Dialect/CUF/CUFDialect.h"
+#include "flang/Optimizer/Dialect/CUF/CUFToLLVMIRTranslation.h"
 #include "flang/Optimizer/Dialect/FIRDialect.h"
 #include "flang/Optimizer/HLFIR/HLFIRDialect.h"
 #include "mlir/Conversion/Passes.h"
@@ -61,6 +62,7 @@ inline void addFIRExtensions(mlir::DialectRegistry ®istry,
   if (addFIRInlinerInterface)
 addFIRInlinerExtension(registry);
   addFIRToLLVMIRExtension(registry);
+  cuf::registerCUFDialectTranslation(registry);
 }
 
 inline void loadNonCodegenDialects(mlir::MLIRContext &context) {
diff --git a/flang/include/flang/Runtime/CUDA/registration.h 
b/flang/include/flang/Runtime/CUDA/registration.h
new file mode 100644
index 00..cbe202c4d23e0d
--- /dev/null
+++ b/flang/include/flang/Runtime/CUDA/registration.h
@@ -0,0 +1,28 @@
+//===-- include/flang/Runtime/CUDA/registration.h ---*- C -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef FORTRAN_RUNTIME_CUDA_REGISTRATION_H_
+#define FORTRAN_RUNTIME_CUDA_REGISTRATION_H_
+
+#include "flang/Runtime/entry-names.h"
+#include 
+
+namespace Fortran::runtime::cuda {
+
+extern "C" {
+
+/// Register a CUDA module.
+void *RTDECL(CUFRegisterModule)(void *data);
+
+/// Register a device function.
+void RTDECL(CUFRegisterFunction)(void **module, const char *fct);
+
+} // extern "C"
+
+} // namespace Fortran::runtime::cuda
+#endif // FORTRAN_RUNTIME_CUDA_REGISTRATION_H_
diff --git a/flang/lib/Optimizer/Dialect/CUF/CMakeLists.txt 
b/flang/lib/Optimizer/Dialect/CUF/CMakeLists.txt
index b222115d58..5d4bd0785971f7 100644
--- a/flang/lib/Optimizer/Dialect/CUF/CMakeLists.txt
+++ b/flang/lib/Optimizer/Dialect/CUF/CMakeLists.txt
@@ -3,6 +3,7 @@ add_subdirectory(Attributes)
 add_flang_library(CUFDialect
   CUFDialect.cpp
   CUFOps.cpp
+  CUFToLLVMIRTranslati

[llvm-branch-commits] [flang] [flang][cuda] Translate cuf.register_kernel and cuf.register_module (PR #112972)

2024-10-18 Thread Renaud Kauffmann via llvm-branch-commits


https://github.com/Renaud-K approved this pull request.

This looks really good. Thank you!

https://github.com/llvm/llvm-project/pull/112972
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [libc] [libc][math][c23] Add log10f16 C23 math function (PR #106091)

2024-10-18 Thread via llvm-branch-commits


https://github.com/overmighty updated 
https://github.com/llvm/llvm-project/pull/106091

>From ad3e07db169e59bc44f9759ab0b484c2abf7a8a4 Mon Sep 17 00:00:00 2001
From: OverMighty 
Date: Mon, 26 Aug 2024 17:23:01 +0200
Subject: [PATCH 1/3] [libc][math][c23] Add log10f16 C23 math function

Part of #95250.
---
 libc/config/gpu/entrypoints.txt|   1 +
 libc/config/linux/x86_64/entrypoints.txt   |   1 +
 libc/docs/math/index.rst   |   2 +-
 libc/spec/stdc.td  |   1 +
 libc/src/math/CMakeLists.txt   |   1 +
 libc/src/math/generic/CMakeLists.txt   |  21 +++
 libc/src/math/generic/expxf16.h|  14 ++
 libc/src/math/generic/log10f16.cpp | 163 +
 libc/src/math/log10f16.h   |  21 +++
 libc/test/src/math/CMakeLists.txt  |  11 ++
 libc/test/src/math/log10f16_test.cpp   |  40 +
 libc/test/src/math/smoke/CMakeLists.txt|  12 ++
 libc/test/src/math/smoke/log10f16_test.cpp |  47 ++
 13 files changed, 334 insertions(+), 1 deletion(-)
 create mode 100644 libc/src/math/generic/log10f16.cpp
 create mode 100644 libc/src/math/log10f16.h
 create mode 100644 libc/test/src/math/log10f16_test.cpp
 create mode 100644 libc/test/src/math/smoke/log10f16_test.cpp

diff --git a/libc/config/gpu/entrypoints.txt b/libc/config/gpu/entrypoints.txt
index 2cc54e8a4b970c..13bb88894297ca 100644
--- a/libc/config/gpu/entrypoints.txt
+++ b/libc/config/gpu/entrypoints.txt
@@ -567,6 +567,7 @@ if(LIBC_TYPES_HAS_FLOAT16)
 libc.src.math.llogbf16
 libc.src.math.llrintf16
 libc.src.math.llroundf16
+libc.src.math.log10f16
 libc.src.math.log2f16
 libc.src.math.logbf16
 libc.src.math.logf16
diff --git a/libc/config/linux/x86_64/entrypoints.txt 
b/libc/config/linux/x86_64/entrypoints.txt
index 06ea7bba81f345..6ed6ad8c2400f3 100644
--- a/libc/config/linux/x86_64/entrypoints.txt
+++ b/libc/config/linux/x86_64/entrypoints.txt
@@ -660,6 +660,7 @@ if(LIBC_TYPES_HAS_FLOAT16)
 libc.src.math.llogbf16
 libc.src.math.llrintf16
 libc.src.math.llroundf16
+libc.src.math.log10f16
 libc.src.math.log2f16
 libc.src.math.logbf16
 libc.src.math.logf16
diff --git a/libc/docs/math/index.rst b/libc/docs/math/index.rst
index 6591cbbdc15584..88751e2453f2c8 100644
--- a/libc/docs/math/index.rst
+++ b/libc/docs/math/index.rst
@@ -312,7 +312,7 @@ Higher Math Functions
 
+---+--+-++--++++
 | log   | |check|  | |check| || 
|check|  || 7.12.6.11  | 
F.10.3.11  |
 
+---+--+-++--++++
-| log10 | |check|  | |check| ||
  || 7.12.6.12  | F.10.3.12 
 |
+| log10 | |check|  | |check| || 
|check|  || 7.12.6.12  | 
F.10.3.12  |
 
+---+--+-++--++++
 | log10p1   |  | ||
  || 7.12.6.13  | F.10.3.13 
 |
 
+---+--+-++--++++
diff --git a/libc/spec/stdc.td b/libc/spec/stdc.td
index d2a073847503ef..33bec2f71627e5 100644
--- a/libc/spec/stdc.td
+++ b/libc/spec/stdc.td
@@ -642,6 +642,7 @@ def StdC : StandardSpec<"stdc"> {
 
   FunctionSpec<"log10", RetValSpec, [ArgSpec]>,
   FunctionSpec<"log10f", RetValSpec, [ArgSpec]>,
+  GuardedFunctionSpec<"log10f16", RetValSpec, 
[ArgSpec], "LIBC_TYPES_HAS_FLOAT16">,
 
   FunctionSpec<"log1p", RetValSpec, [ArgSpec]>,
   FunctionSpec<"log1pf", RetValSpec, [ArgSpec]>,
diff --git a/libc/src/math/CMakeLists.txt b/libc/src/math/CMakeLists.txt
index 516bed499b1941..9239f029abf87a 100644
--- a/libc/src/math/CMakeLists.txt
+++ b/libc/src/math/CMakeLists.txt
@@ -334,6 +334,7 @@ add_math_entrypoint_object(ldexpf128)
 
 add_math_entrypoint_object(log10)
 add_math_entrypoint_object(log10f)
+add_math_entrypoint_object(log10f16)
 
 add_math_entrypoint_object(log1p)
 add_math_entrypoint_object(log1pf)
diff --git a/libc/src/math/generic/CMakeLists.txt 
b/libc/src/math/generic/CMakeLists.txt
index d7c7a3431d3d95..eb7e83611fcd2e 100644
--- a/libc/src/math/generic/CMakeLists.t

[llvm-branch-commits] [libc] [libc][math][c23] Add sqrtf16 C23 math function (PR #112406)

2024-10-18 Thread via llvm-branch-commits


https://github.com/overmighty updated 
https://github.com/llvm/llvm-project/pull/112406

>From 38a132acaaa5f3c94234cea4b59c1dfdc0a49433 Mon Sep 17 00:00:00 2001
From: OverMighty 
Date: Mon, 26 Aug 2024 18:43:12 +0200
Subject: [PATCH] [libc][math][c23] Add sqrtf16 C23 math function

Part of #95250.
---
 libc/config/gpu/entrypoints.txt   |  1 +
 libc/config/linux/aarch64/entrypoints.txt |  1 +
 libc/config/linux/x86_64/entrypoints.txt  |  1 +
 libc/docs/math/index.rst  |  2 +-
 libc/spec/stdc.td |  1 +
 libc/src/__support/FPUtil/generic/sqrt.h  |  3 ++-
 libc/src/math/CMakeLists.txt  |  1 +
 libc/src/math/generic/CMakeLists.txt  | 12 ++
 libc/src/math/generic/sqrtf16.cpp | 20 
 libc/src/math/sqrtf16.h   | 21 +
 libc/test/src/math/CMakeLists.txt | 11 +
 libc/test/src/math/smoke/CMakeLists.txt   | 12 ++
 libc/test/src/math/smoke/sqrtf16_test.cpp | 13 +++
 libc/test/src/math/sqrtf16_test.cpp   | 28 +++
 14 files changed, 125 insertions(+), 2 deletions(-)
 create mode 100644 libc/src/math/generic/sqrtf16.cpp
 create mode 100644 libc/src/math/sqrtf16.h
 create mode 100644 libc/test/src/math/smoke/sqrtf16_test.cpp
 create mode 100644 libc/test/src/math/sqrtf16_test.cpp

diff --git a/libc/config/gpu/entrypoints.txt b/libc/config/gpu/entrypoints.txt
index 13bb88894297ca..38e9f2e685caed 100644
--- a/libc/config/gpu/entrypoints.txt
+++ b/libc/config/gpu/entrypoints.txt
@@ -590,6 +590,7 @@ if(LIBC_TYPES_HAS_FLOAT16)
 libc.src.math.setpayloadf16
 libc.src.math.setpayloadsigf16
 libc.src.math.sinhf16
+libc.src.math.sqrtf16
 libc.src.math.tanhf16
 libc.src.math.totalorderf16
 libc.src.math.totalordermagf16
diff --git a/libc/config/linux/aarch64/entrypoints.txt 
b/libc/config/linux/aarch64/entrypoints.txt
index 885827d304efe3..71c6e874429fed 100644
--- a/libc/config/linux/aarch64/entrypoints.txt
+++ b/libc/config/linux/aarch64/entrypoints.txt
@@ -680,6 +680,7 @@ if(LIBC_TYPES_HAS_FLOAT16)
 libc.src.math.setpayloadf16
 libc.src.math.setpayloadsigf16
 libc.src.math.sinpif16
+libc.src.math.sqrtf16
 libc.src.math.totalorderf16
 libc.src.math.totalordermagf16
 libc.src.math.truncf16
diff --git a/libc/config/linux/x86_64/entrypoints.txt 
b/libc/config/linux/x86_64/entrypoints.txt
index 6ed6ad8c2400f3..9bc63edf06f28c 100644
--- a/libc/config/linux/x86_64/entrypoints.txt
+++ b/libc/config/linux/x86_64/entrypoints.txt
@@ -684,6 +684,7 @@ if(LIBC_TYPES_HAS_FLOAT16)
 libc.src.math.setpayloadsigf16
 libc.src.math.sinhf16
 libc.src.math.sinpif16
+libc.src.math.sqrtf16
 libc.src.math.tanhf16
 libc.src.math.totalorderf16
 libc.src.math.totalordermagf16
diff --git a/libc/docs/math/index.rst b/libc/docs/math/index.rst
index 88751e2453f2c8..ce4df92393ce7f 100644
--- a/libc/docs/math/index.rst
+++ b/libc/docs/math/index.rst
@@ -344,7 +344,7 @@ Higher Math Functions
 
+---+--+-++--++++
 | sinpi | |check|  | ||  
|check| || 7.12.4.13  | 
F.10.1.13  |
 
+---+--+-++--++++
-| sqrt  | |check|  | |check| | |check||
  | |check|| 7.12.7.10  | F.10.4.10 
 |
+| sqrt  | |check|  | |check| | |check|| 
|check|  | |check|| 7.12.7.10  | 
F.10.4.10  |
 
+---+--+-++--++++
 | tan   | |check|  | |check| ||
  || 7.12.4.7   | F.10.1.7  
 |
 
+---+--+-++--++++
diff --git a/libc/spec/stdc.td b/libc/spec/stdc.td
index 33bec2f71627e5..d1ebc6ffb5821e 100644
--- a/libc/spec/stdc.td
+++ b/libc/spec/stdc.td
@@ -754,6 +754,7 @@ def StdC : StandardSpec<"stdc"> {
   FunctionSpec<"sqrt", RetValSpec, [ArgSpec]>,
   FunctionSpec<"sqrtf", RetValSpec, [ArgSpec]>,
   FunctionSpec<"sqrtl", RetValSpec, 
[ArgSpec]>,
+  GuardedFunctionSpec<"sqrtf16", RetValSpec, 
[ArgSpec], "LIBC_TYPES_HAS_FLOAT16">,
   GuardedFunctionSpec<"sqrtf128", Re

[llvm-branch-commits] [clang] [clang] Fix C23 constexpr crashes (#112708) (PR #112855)

2024-10-18 Thread Mariya Podchishchaeva via llvm-branch-commits


https://github.com/Fznamznon created 
https://github.com/llvm/llvm-project/pull/112855

Before using a constexpr variable that is not properly initialized check that 
it is valid.

Fixes https://github.com/llvm/llvm-project/issues/109095
Fixes https://github.com/llvm/llvm-project/issues/112516

>From d3121f27d84c2b254d6ba551be45ce020cfa5a10 Mon Sep 17 00:00:00 2001
From: Mariya Podchishchaeva 
Date: Fri, 18 Oct 2024 10:18:34 +0200
Subject: [PATCH] [clang] Fix C23 constexpr crashes (#112708)

Before using a constexpr variable that is not properly initialized check
that it is valid.

Fixes https://github.com/llvm/llvm-project/issues/109095
Fixes https://github.com/llvm/llvm-project/issues/112516
---
 clang/lib/AST/Decl.cpp  | 10 +++---
 clang/test/Sema/constexpr.c | 14 ++
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/clang/lib/AST/Decl.cpp b/clang/lib/AST/Decl.cpp
index 490c4a2fc525cd..bc7cce0bcd7fc2 100644
--- a/clang/lib/AST/Decl.cpp
+++ b/clang/lib/AST/Decl.cpp
@@ -2503,7 +2503,8 @@ bool VarDecl::isUsableInConstantExpressions(const 
ASTContext &Context) const {
   if (!DefVD->mightBeUsableInConstantExpressions(Context))
 return false;
   //   ... and its initializer is a constant initializer.
-  if (Context.getLangOpts().CPlusPlus && !DefVD->hasConstantInitialization())
+  if ((Context.getLangOpts().CPlusPlus || getLangOpts().C23) &&
+  !DefVD->hasConstantInitialization())
 return false;
   // C++98 [expr.const]p1:
   //   An integral constant-expression can involve only [...] const variables
@@ -2610,8 +2611,11 @@ bool VarDecl::hasICEInitializer(const ASTContext 
&Context) const {
 }
 
 bool VarDecl::hasConstantInitialization() const {
-  // In C, all globals (and only globals) have constant initialization.
-  if (hasGlobalStorage() && !getASTContext().getLangOpts().CPlusPlus)
+  // In C, all globals and constexpr variables should have constant
+  // initialization. For constexpr variables in C check that initializer is a
+  // constant initializer because they can be used in constant expressions.
+  if (hasGlobalStorage() && !getASTContext().getLangOpts().CPlusPlus &&
+  !isConstexpr())
 return true;
 
   // In C++, it depends on whether the evaluation at the point of definition
diff --git a/clang/test/Sema/constexpr.c b/clang/test/Sema/constexpr.c
index 8286cd2107d2f2..0bf3a41c18d76d 100644
--- a/clang/test/Sema/constexpr.c
+++ b/clang/test/Sema/constexpr.c
@@ -357,3 +357,17 @@ void infsNaNs() {
   constexpr double db5 = LD_SNAN; // expected-error {{constexpr initializer 
evaluates to nan which is not exactly representable in type 'const double'}}
   constexpr double db6 = INF;
 }
+
+void ghissue112516() {
+  struct S11 *s11 = 0;
+  constexpr int num = s11->len; // expected-error {{constexpr variable 'num' 
must be initialized by a constant expression}}
+  void *Arr[num];
+}
+
+void ghissue109095() {
+  constexpr char c[] = { 'a' };
+  constexpr int i = c[1]; // expected-error {{constexpr variable 'i' must be 
initialized by a constant expression}}\
+  // expected-note {{declared here}}
+  _Static_assert(i == c[0]); // expected-error {{static assertion expression 
is not an integral constant expression}}\
+ // expected-note {{initializer of 'i' is not a 
constant expression}}
+}

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [clang] Fix C23 constexpr crashes (#112708) (PR #112855)

2024-10-18 Thread Mariya Podchishchaeva via llvm-branch-commits


https://github.com/Fznamznon edited 
https://github.com/llvm/llvm-project/pull/112855
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [clang] Fix C23 constexpr crashes (#112708) (PR #112855)

2024-10-18 Thread Mariya Podchishchaeva via llvm-branch-commits


https://github.com/Fznamznon milestoned 
https://github.com/llvm/llvm-project/pull/112855
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [clang] Fix C23 constexpr crashes (#112708) (PR #112855)

2024-10-18 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-clang

Author: Mariya Podchishchaeva (Fznamznon)


Changes

Before using a constexpr variable that is not properly initialized check that 
it is valid.

Fixes https://github.com/llvm/llvm-project/issues/109095
Fixes https://github.com/llvm/llvm-project/issues/112516

---
Full diff: https://github.com/llvm/llvm-project/pull/112855.diff


2 Files Affected:

- (modified) clang/lib/AST/Decl.cpp (+7-3) 
- (modified) clang/test/Sema/constexpr.c (+14) 


``diff
diff --git a/clang/lib/AST/Decl.cpp b/clang/lib/AST/Decl.cpp
index 490c4a2fc525cd..bc7cce0bcd7fc2 100644
--- a/clang/lib/AST/Decl.cpp
+++ b/clang/lib/AST/Decl.cpp
@@ -2503,7 +2503,8 @@ bool VarDecl::isUsableInConstantExpressions(const 
ASTContext &Context) const {
   if (!DefVD->mightBeUsableInConstantExpressions(Context))
 return false;
   //   ... and its initializer is a constant initializer.
-  if (Context.getLangOpts().CPlusPlus && !DefVD->hasConstantInitialization())
+  if ((Context.getLangOpts().CPlusPlus || getLangOpts().C23) &&
+  !DefVD->hasConstantInitialization())
 return false;
   // C++98 [expr.const]p1:
   //   An integral constant-expression can involve only [...] const variables
@@ -2610,8 +2611,11 @@ bool VarDecl::hasICEInitializer(const ASTContext 
&Context) const {
 }
 
 bool VarDecl::hasConstantInitialization() const {
-  // In C, all globals (and only globals) have constant initialization.
-  if (hasGlobalStorage() && !getASTContext().getLangOpts().CPlusPlus)
+  // In C, all globals and constexpr variables should have constant
+  // initialization. For constexpr variables in C check that initializer is a
+  // constant initializer because they can be used in constant expressions.
+  if (hasGlobalStorage() && !getASTContext().getLangOpts().CPlusPlus &&
+  !isConstexpr())
 return true;
 
   // In C++, it depends on whether the evaluation at the point of definition
diff --git a/clang/test/Sema/constexpr.c b/clang/test/Sema/constexpr.c
index 8286cd2107d2f2..0bf3a41c18d76d 100644
--- a/clang/test/Sema/constexpr.c
+++ b/clang/test/Sema/constexpr.c
@@ -357,3 +357,17 @@ void infsNaNs() {
   constexpr double db5 = LD_SNAN; // expected-error {{constexpr initializer 
evaluates to nan which is not exactly representable in type 'const double'}}
   constexpr double db6 = INF;
 }
+
+void ghissue112516() {
+  struct S11 *s11 = 0;
+  constexpr int num = s11->len; // expected-error {{constexpr variable 'num' 
must be initialized by a constant expression}}
+  void *Arr[num];
+}
+
+void ghissue109095() {
+  constexpr char c[] = { 'a' };
+  constexpr int i = c[1]; // expected-error {{constexpr variable 'i' must be 
initialized by a constant expression}}\
+  // expected-note {{declared here}}
+  _Static_assert(i == c[0]); // expected-error {{static assertion expression 
is not an integral constant expression}}\
+ // expected-note {{initializer of 'i' is not a 
constant expression}}
+}

``




https://github.com/llvm/llvm-project/pull/112855
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] a173d27 - Revert "[AArch64][SVE] Enable max vector bandwidth for SVE (#109671)"

2024-10-18 Thread via llvm-branch-commits


Author: Graham Hunter
Date: 2024-10-18T11:01:21+01:00
New Revision: a173d27e4a176c07de15cc7754155c3dff0fde15

URL: 
https://github.com/llvm/llvm-project/commit/a173d27e4a176c07de15cc7754155c3dff0fde15
DIFF: 
https://github.com/llvm/llvm-project/commit/a173d27e4a176c07de15cc7754155c3dff0fde15.diff

LOG: Revert "[AArch64][SVE] Enable max vector bandwidth for SVE (#109671)"

This reverts commit c980a20b105c9298a5975b6944417f17cf772b6b.

Added: 


Modified: 
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll

llvm/test/Transforms/LoopVectorize/AArch64/scalable-vectorization-cost-tuning.ll
llvm/test/Transforms/LoopVectorize/AArch64/scalable-vectorization.ll
llvm/test/Transforms/LoopVectorize/AArch64/store-costs-sve.ll
llvm/test/Transforms/LoopVectorize/AArch64/sve2-histcnt.ll
llvm/test/Transforms/LoopVectorize/AArch64/type-shrinkage-zext-costs.ll
llvm/test/Transforms/LoopVectorize/AArch64/wider-VF-for-callinst.ll

Removed: 




diff  --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp 
b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 7c6b789b9c1b72..ff3c69f7e10c66 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -337,10 +337,8 @@ AArch64TTIImpl::getInlineCallPenalty(const Function *F, 
const CallBase &Call,
 bool AArch64TTIImpl::shouldMaximizeVectorBandwidth(
 TargetTransformInfo::RegisterKind K) const {
   assert(K != TargetTransformInfo::RGK_Scalar);
-  return ((K == TargetTransformInfo::RGK_FixedWidthVector &&
-   ST->isNeonAvailable()) ||
-  (K == TargetTransformInfo::RGK_ScalableVector &&
-   ST->isSVEorStreamingSVEAvailable()));
+  return (K == TargetTransformInfo::RGK_FixedWidthVector &&
+  ST->isNeonAvailable());
 }
 
 /// Calculate the cost of materializing a 64-bit value. This helper

diff  --git 
a/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll 
b/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
index 01fca39296da09..7f325ce1a1f04b 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
@@ -732,20 +732,9 @@ define void @multiple_exit_conditions(ptr %src, ptr 
noalias %dst) #1 {
 ; DEFAULT-LABEL: define void @multiple_exit_conditions(
 ; DEFAULT-SAME: ptr [[SRC:%.*]], ptr noalias [[DST:%.*]]) #[[ATTR2:[0-9]+]] {
 ; DEFAULT-NEXT:  entry:
-; DEFAULT-NEXT:[[TMP7:%.*]] = call i64 @llvm.vscale.i64()
-; DEFAULT-NEXT:[[TMP8:%.*]] = mul i64 [[TMP7]], 32
-; DEFAULT-NEXT:[[MIN_ITERS_CHECK:%.*]] = icmp ult i64 257, [[TMP8]]
-; DEFAULT-NEXT:br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label 
[[VECTOR_PH:%.*]]
+; DEFAULT-NEXT:br i1 false, label [[SCALAR_PH:%.*]], label 
[[VECTOR_PH:%.*]]
 ; DEFAULT:   vector.ph:
-; DEFAULT-NEXT:[[TMP2:%.*]] = call i64 @llvm.vscale.i64()
-; DEFAULT-NEXT:[[TMP3:%.*]] = mul i64 [[TMP2]], 32
-; DEFAULT-NEXT:[[N_MOD_VF:%.*]] = urem i64 257, [[TMP3]]
-; DEFAULT-NEXT:[[N_VEC:%.*]] = sub i64 257, [[N_MOD_VF]]
-; DEFAULT-NEXT:[[TMP17:%.*]] = mul i64 [[N_VEC]], 8
-; DEFAULT-NEXT:[[IND_END:%.*]] = getelementptr i8, ptr [[DST]], i64 
[[TMP17]]
-; DEFAULT-NEXT:[[IND_END1:%.*]] = mul i64 [[N_VEC]], 2
-; DEFAULT-NEXT:[[TMP5:%.*]] = call i64 @llvm.vscale.i64()
-; DEFAULT-NEXT:[[TMP6:%.*]] = mul i64 [[TMP5]], 32
+; DEFAULT-NEXT:[[IND_END:%.*]] = getelementptr i8, ptr [[DST]], i64 2048
 ; DEFAULT-NEXT:br label [[VECTOR_BODY:%.*]]
 ; DEFAULT:   vector.body:
 ; DEFAULT-NEXT:[[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ 
[[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
@@ -753,39 +742,20 @@ define void @multiple_exit_conditions(ptr %src, ptr 
noalias %dst) #1 {
 ; DEFAULT-NEXT:[[TMP0:%.*]] = add i64 [[OFFSET_IDX]], 0
 ; DEFAULT-NEXT:[[NEXT_GEP:%.*]] = getelementptr i8, ptr [[DST]], i64 
[[TMP0]]
 ; DEFAULT-NEXT:[[TMP1:%.*]] = load i16, ptr [[SRC]], align 2
-; DEFAULT-NEXT:[[BROADCAST_SPLATINSERT:%.*]] = insertelement  poison, i16 [[TMP1]], i64 0
-; DEFAULT-NEXT:[[BROADCAST_SPLAT:%.*]] = shufflevector  
[[BROADCAST_SPLATINSERT]],  poison,  
zeroinitializer
-; DEFAULT-NEXT:[[TMP9:%.*]] = or  [[BROADCAST_SPLAT]], 
shufflevector ( insertelement ( poison, i16 
1, i64 0),  poison,  zeroinitializer)
-; DEFAULT-NEXT:[[TMP10:%.*]] = or  [[BROADCAST_SPLAT]], 
shufflevector ( insertelement ( poison, i16 
1, i64 0),  poison,  zeroinitializer)
-; DEFAULT-NEXT:[[TMP11:%.*]] = or  [[BROADCAST_SPLAT]], 
shufflevector ( insertelement ( poison, i16 
1, i64 0),  poison,  zeroinitializer)
-; DEFAULT-NEXT:[[TMP12:%.*]] = or  [[BROADCAST_SPLAT]], 
shufflevector ( insertelement ( poison, i16 
1, i64 0),  poison,  zeroinitializer)

[llvm-branch-commits] [clang] [clang] Fix C23 constexpr crashes (#112708) (PR #112855)

2024-10-18 Thread Mariya Podchishchaeva via llvm-branch-commits


https://github.com/Fznamznon updated 
https://github.com/llvm/llvm-project/pull/112855

>From fb40fba6f9caf44a1e839525efdaebdf936c2934 Mon Sep 17 00:00:00 2001
From: Mariya Podchishchaeva 
Date: Fri, 18 Oct 2024 10:18:34 +0200
Subject: [PATCH] [clang] Fix C23 constexpr crashes (#112708)

Before using a constexpr variable that is not properly initialized check
that it is valid.

Fixes https://github.com/llvm/llvm-project/issues/109095
Fixes https://github.com/llvm/llvm-project/issues/112516
---
 clang/lib/AST/Decl.cpp  | 10 +++---
 clang/test/Sema/constexpr.c | 17 +
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/clang/lib/AST/Decl.cpp b/clang/lib/AST/Decl.cpp
index 490c4a2fc525cd..bc7cce0bcd7fc2 100644
--- a/clang/lib/AST/Decl.cpp
+++ b/clang/lib/AST/Decl.cpp
@@ -2503,7 +2503,8 @@ bool VarDecl::isUsableInConstantExpressions(const 
ASTContext &Context) const {
   if (!DefVD->mightBeUsableInConstantExpressions(Context))
 return false;
   //   ... and its initializer is a constant initializer.
-  if (Context.getLangOpts().CPlusPlus && !DefVD->hasConstantInitialization())
+  if ((Context.getLangOpts().CPlusPlus || getLangOpts().C23) &&
+  !DefVD->hasConstantInitialization())
 return false;
   // C++98 [expr.const]p1:
   //   An integral constant-expression can involve only [...] const variables
@@ -2610,8 +2611,11 @@ bool VarDecl::hasICEInitializer(const ASTContext 
&Context) const {
 }
 
 bool VarDecl::hasConstantInitialization() const {
-  // In C, all globals (and only globals) have constant initialization.
-  if (hasGlobalStorage() && !getASTContext().getLangOpts().CPlusPlus)
+  // In C, all globals and constexpr variables should have constant
+  // initialization. For constexpr variables in C check that initializer is a
+  // constant initializer because they can be used in constant expressions.
+  if (hasGlobalStorage() && !getASTContext().getLangOpts().CPlusPlus &&
+  !isConstexpr())
 return true;
 
   // In C++, it depends on whether the evaluation at the point of definition
diff --git a/clang/test/Sema/constexpr.c b/clang/test/Sema/constexpr.c
index 8286cd2107d2f2..fe014fadb11ec8 100644
--- a/clang/test/Sema/constexpr.c
+++ b/clang/test/Sema/constexpr.c
@@ -357,3 +357,20 @@ void infsNaNs() {
   constexpr double db5 = LD_SNAN; // expected-error {{constexpr initializer 
evaluates to nan which is not exactly representable in type 'const double'}}
   constexpr double db6 = INF;
 }
+
+struct S11 {
+  int len;
+};
+void ghissue112516() {
+  struct S11 *s11 = 0;
+  constexpr int num = s11->len; // expected-error {{constexpr variable 'num' 
must be initialized by a constant expression}}
+  void *Arr[num];
+}
+
+void ghissue109095() {
+  constexpr char c[] = { 'a' };
+  constexpr int i = c[1]; // expected-error {{constexpr variable 'i' must be 
initialized by a constant expression}}\
+  // expected-note {{declared here}}
+  _Static_assert(i == c[0]); // expected-error {{static assertion expression 
is not an integral constant expression}}\
+ // expected-note {{initializer of 'i' is not a 
constant expression}}
+}

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [CGData] Global Merge Functions (PR #112671)

2024-10-18 Thread Thorsten Schütt via llvm-branch-commits



@@ -0,0 +1,77 @@
+//===-- GlobalMergeFunctions.h - Global merge functions -*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+///
+/// This file defines global merge functions pass and related data structure.
+///
+//===--===//
+
+#ifndef PIKA_TRANSFORMS_UTILS_GLOBALMERGEFUNCTIONS_H

tschuett wrote:

`PIKA`

https://github.com/llvm/llvm-project/pull/112671
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: RBLegalize rules for load (PR #112882)

2024-10-18 Thread Petar Avramovic via llvm-branch-commits


https://github.com/petar-avramovic created 
https://github.com/llvm/llvm-project/pull/112882

Add IDs for bit width that cover multiple LLTs: B32 B64 etc.
"Predicate" wrapper class for bool predicate functions used to
write pretty rules. Predicates can be combined using &&, || and !.
Lowering for splitting and widening loads.
Write rules for loads to not change existing mir tests from old
regbankselect.

>From 5cc5acd4754d7a78fa7df9ae45542e3c1561f13b Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Thu, 17 Oct 2024 16:39:55 +0200
Subject: [PATCH] AMDGPU/GlobalISel: RBLegalize rules for load

Add IDs for bit width that cover multiple LLTs: B32 B64 etc.
"Predicate" wrapper class for bool predicate functions used to
write pretty rules. Predicates can be combined using &&, || and !.
Lowering for splitting and widening loads.
Write rules for loads to not change existing mir tests from old
regbankselect.
---
 .../Target/AMDGPU/AMDGPURBLegalizeHelper.cpp  | 302 -
 .../Target/AMDGPU/AMDGPURBLegalizeHelper.h|   7 +-
 .../Target/AMDGPU/AMDGPURBLegalizeRules.cpp   | 307 -
 .../lib/Target/AMDGPU/AMDGPURBLegalizeRules.h |  65 +++-
 .../AMDGPU/GlobalISel/regbankselect-load.mir  | 320 +++---
 .../GlobalISel/regbankselect-zextload.mir |   9 +-
 6 files changed, 942 insertions(+), 68 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPURBLegalizeHelper.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURBLegalizeHelper.cpp
index a0f6ecedab7a83..f58f0a315096d2 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURBLegalizeHelper.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURBLegalizeHelper.cpp
@@ -37,6 +37,97 @@ bool 
RegBankLegalizeHelper::findRuleAndApplyMapping(MachineInstr &MI) {
   return true;
 }
 
+void RegBankLegalizeHelper::splitLoad(MachineInstr &MI,
+  ArrayRef LLTBreakdown, LLT MergeTy) 
{
+  MachineFunction &MF = B.getMF();
+  assert(MI.getNumMemOperands() == 1);
+  MachineMemOperand &BaseMMO = **MI.memoperands_begin();
+  Register Dst = MI.getOperand(0).getReg();
+  const RegisterBank *DstRB = MRI.getRegBankOrNull(Dst);
+  Register BasePtrReg = MI.getOperand(1).getReg();
+  LLT PtrTy = MRI.getType(BasePtrReg);
+  const RegisterBank *PtrRB = MRI.getRegBankOrNull(BasePtrReg);
+  LLT OffsetTy = LLT::scalar(PtrTy.getSizeInBits());
+  SmallVector LoadPartRegs;
+
+  unsigned ByteOffset = 0;
+  for (LLT PartTy : LLTBreakdown) {
+Register BasePtrPlusOffsetReg;
+if (ByteOffset == 0) {
+  BasePtrPlusOffsetReg = BasePtrReg;
+} else {
+  BasePtrPlusOffsetReg = MRI.createVirtualRegister({PtrRB, PtrTy});
+  Register OffsetReg = MRI.createVirtualRegister({PtrRB, OffsetTy});
+  B.buildConstant(OffsetReg, ByteOffset);
+  B.buildPtrAdd(BasePtrPlusOffsetReg, BasePtrReg, OffsetReg);
+}
+MachineMemOperand *BasePtrPlusOffsetMMO =
+MF.getMachineMemOperand(&BaseMMO, ByteOffset, PartTy);
+Register PartLoad = MRI.createVirtualRegister({DstRB, PartTy});
+B.buildLoad(PartLoad, BasePtrPlusOffsetReg, *BasePtrPlusOffsetMMO);
+LoadPartRegs.push_back(PartLoad);
+ByteOffset += PartTy.getSizeInBytes();
+  }
+
+  if (!MergeTy.isValid()) {
+// Loads are of same size, concat or merge them together.
+B.buildMergeLikeInstr(Dst, LoadPartRegs);
+  } else {
+// Load(s) are not all of same size, need to unmerge them to smaller pieces
+// of MergeTy type, then merge them all together in Dst.
+SmallVector MergeTyParts;
+for (Register Reg : LoadPartRegs) {
+  if (MRI.getType(Reg) == MergeTy) {
+MergeTyParts.push_back(Reg);
+  } else {
+auto Unmerge = B.buildUnmerge(MergeTy, Reg);
+for (unsigned i = 0; i < Unmerge->getNumOperands() - 1; ++i) {
+  Register UnmergeReg = Unmerge->getOperand(i).getReg();
+  MRI.setRegBank(UnmergeReg, *DstRB);
+  MergeTyParts.push_back(UnmergeReg);
+}
+  }
+}
+B.buildMergeLikeInstr(Dst, MergeTyParts);
+  }
+  MI.eraseFromParent();
+}
+
+void RegBankLegalizeHelper::widenLoad(MachineInstr &MI, LLT WideTy,
+  LLT MergeTy) {
+  MachineFunction &MF = B.getMF();
+  assert(MI.getNumMemOperands() == 1);
+  MachineMemOperand &BaseMMO = **MI.memoperands_begin();
+  Register Dst = MI.getOperand(0).getReg();
+  const RegisterBank *DstRB = MRI.getRegBankOrNull(Dst);
+  Register BasePtrReg = MI.getOperand(1).getReg();
+
+  Register BasePtrPlusOffsetReg;
+  BasePtrPlusOffsetReg = BasePtrReg;
+
+  MachineMemOperand *BasePtrPlusOffsetMMO =
+  MF.getMachineMemOperand(&BaseMMO, 0, WideTy);
+  Register WideLoad = MRI.createVirtualRegister({DstRB, WideTy});
+  B.buildLoad(WideLoad, BasePtrPlusOffsetReg, *BasePtrPlusOffsetMMO);
+
+  if (WideTy.isScalar()) {
+B.buildTrunc(Dst, WideLoad);
+  } else {
+SmallVector MergeTyParts;
+unsigned NumEltsMerge =
+MRI.getType(Dst).getSizeInBits() / MergeTy.getSizeInBits();
+auto Unmerge = B.buildUnmerge(MergeTy, WideLoad);
+

[llvm-branch-commits] [llvm] MachineUniformityAnalysis: Improve isConstantOrUndefValuePhi (PR #112866)

2024-10-18 Thread Petar Avramovic via llvm-branch-commits


https://github.com/petar-avramovic edited 
https://github.com/llvm/llvm-project/pull/112866
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: RBLegalize rules for load (PR #112882)

2024-10-18 Thread Petar Avramovic via llvm-branch-commits


petar-avramovic wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/112882?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#112866** https://app.graphite.dev/github/pr/llvm/llvm-project/112866?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#112882** https://app.graphite.dev/github/pr/llvm/llvm-project/112882?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈
* **#112864** https://app.graphite.dev/github/pr/llvm/llvm-project/112864?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>: 1 other dependent PR 
([#112865](https://github.com/llvm/llvm-project/pull/112865) https://app.graphite.dev/github/pr/llvm/llvm-project/112865?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>)
* **#112863** https://app.graphite.dev/github/pr/llvm/llvm-project/112863?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#112862** https://app.graphite.dev/github/pr/llvm/llvm-project/112862?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about 
stacking.


 Join @petar-avramovic and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/112882
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [CGData] Global Merge Functions (PR #112671)

2024-10-18 Thread Thorsten Schütt via llvm-branch-commits



@@ -0,0 +1,77 @@
+//===-- GlobalMergeFunctions.h - Global merge functions -*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+///
+/// This file defines global merge functions pass and related data structure.
+///
+//===--===//
+
+#ifndef PIKA_TRANSFORMS_UTILS_GLOBALMERGEFUNCTIONS_H
+#define PIKA_TRANSFORMS_UTILS_GLOBALMERGEFUNCTIONS_H
+
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/StableHashing.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/CGData/StableFunctionMap.h"
+#include "llvm/IR/Instructions.h"
+#include "llvm/IR/Module.h"
+#include "llvm/Pass.h"
+#include 
+#include 
+
+enum class HashFunctionMode {
+  Local,
+  BuildingHashFuncion,
+  UsingHashFunction,
+};
+
+namespace llvm {
+
+// A vector of locations (the pair of (instruction, operand) indices) reachable
+// from a parameter.
+using ParamLocs = SmallVector;
+// A vector of parameters
+using ParamLocsVecTy = SmallVector;
+// A map of stable hash to a vector of stable functions
+
+/// GlobalMergeFunc finds functions which only differ by constants in
+/// certain instructions, e.g. resulting from specialized functions of layout
+/// compatible types.
+/// Unlike PikaMergeFunc that directly compares IRs, this uses stable function

tschuett wrote:

`PikaMergeFunc`

https://github.com/llvm/llvm-project/pull/112671
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

61 matches

Mail list logo