[llvm-branch-commits] [llvm] release/19.x: Revert "[CGData] llvm-cgdata (#89884)" (PR #103886)

2024-08-15 Thread Kyungwoo Lee via llvm-branch-commits

kyulee-com wrote:

> > So we should remove this tool from the 19.x release? Can someone confirm?
> 
> @kyulee-com @thevinster Are you two able to help confirm this?

Yeah. I think we should remove this from the release as it was reverted.
We plan to re-land it via https://github.com/llvm/llvm-project/pull/101461 once 
it gets approved.

https://github.com/llvm/llvm-project/pull/103886
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: Revert "[CGData] llvm-cgdata (#89884)" (PR #103886)

2024-08-16 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com approved this pull request.


https://github.com/llvm/llvm-project/pull/103886
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] Thin11 (PR #111464)

2024-10-07 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com created 
https://github.com/llvm/llvm-project/pull/111464

None

>From 1249f0411388fb0832b49e80e7b6a0985822b026 Mon Sep 17 00:00:00 2001
From: Kyungwoo Lee 
Date: Fri, 13 Sep 2024 08:51:00 -0700
Subject: [PATCH 1/4] [CGData][ThinLTO] Global Outlining with Two-CodeGen
 Rounds

---
 llvm/include/llvm/CGData/CodeGenData.h|  16 +++
 llvm/lib/CGData/CodeGenData.cpp   |  81 +-
 llvm/lib/LTO/CMakeLists.txt   |   1 +
 llvm/lib/LTO/LTO.cpp  | 103 +-
 llvm/lib/LTO/LTOBackend.cpp   |  11 ++
 .../test/ThinLTO/AArch64/cgdata-two-rounds.ll |  94 
 llvm/test/ThinLTO/AArch64/lit.local.cfg   |   2 +
 7 files changed, 302 insertions(+), 6 deletions(-)
 create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-two-rounds.ll
 create mode 100644 llvm/test/ThinLTO/AArch64/lit.local.cfg

diff --git a/llvm/include/llvm/CGData/CodeGenData.h 
b/llvm/include/llvm/CGData/CodeGenData.h
index 84133a433170fe..1e1afe99327650 100644
--- a/llvm/include/llvm/CGData/CodeGenData.h
+++ b/llvm/include/llvm/CGData/CodeGenData.h
@@ -164,6 +164,22 @@ publishOutlinedHashTree(std::unique_ptr 
HashTree) {
   CodeGenData::getInstance().publishOutlinedHashTree(std::move(HashTree));
 }
 
+/// Initialize the two-codegen rounds.
+void initializeTwoCodegenRounds();
+
+/// Save the current module before the first codegen round.
+void saveModuleForTwoRounds(const Module &TheModule, unsigned Task);
+
+/// Load the current module before the second codegen round.
+std::unique_ptr loadModuleForTwoRounds(BitcodeModule &OrigModule,
+   unsigned Task,
+   LLVMContext &Context);
+
+/// Merge the codegen data from the input files in scratch vector in ThinLTO
+/// two-codegen rounds.
+Error mergeCodeGenData(
+const std::unique_ptr>> InputFiles);
+
 void warn(Error E, StringRef Whence = "");
 void warn(Twine Message, std::string Whence = "", std::string Hint = "");
 
diff --git a/llvm/lib/CGData/CodeGenData.cpp b/llvm/lib/CGData/CodeGenData.cpp
index 55d2504231c744..ff8e5dd7c75790 100644
--- a/llvm/lib/CGData/CodeGenData.cpp
+++ b/llvm/lib/CGData/CodeGenData.cpp
@@ -17,6 +17,7 @@
 #include "llvm/Object/ObjectFile.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/FileSystem.h"
+#include "llvm/Support/Path.h"
 #include "llvm/Support/WithColor.h"
 
 #define DEBUG_TYPE "cg-data"
@@ -30,6 +31,14 @@ cl::opt
 cl::opt
 CodeGenDataUsePath("codegen-data-use-path", cl::init(""), cl::Hidden,
cl::desc("File path to where .cgdata file is read"));
+cl::opt CodeGenDataThinLTOTwoRounds(
+"codegen-data-thinlto-two-rounds", cl::init(false), cl::Hidden,
+cl::desc("Enable two-round ThinLTO code generation. The first round "
+ "emits codegen data, while the second round uses the emitted "
+ "codegen data for further optimizations."));
+
+// Path to where the optimized bitcodes are saved and restored for ThinLTO.
+static SmallString<128> CodeGenDataThinLTOTwoRoundsPath;
 
 static std::string getCGDataErrString(cgdata_error Err,
   const std::string &ErrMsg = "") {
@@ -139,7 +148,7 @@ CodeGenData &CodeGenData::getInstance() {
   std::call_once(CodeGenData::OnceFlag, []() {
 Instance = std::unique_ptr(new CodeGenData());
 
-if (CodeGenDataGenerate)
+if (CodeGenDataGenerate || CodeGenDataThinLTOTwoRounds)
   Instance->EmitCGData = true;
 else if (!CodeGenDataUsePath.empty()) {
   // Initialize the global CGData if the input file name is given.
@@ -215,6 +224,76 @@ void warn(Error E, StringRef Whence) {
   }
 }
 
+static std::string getPath(StringRef Dir, unsigned Task) {
+  return (Dir + "/" + llvm::Twine(Task) + ".saved_copy.bc").str();
+}
+
+void initializeTwoCodegenRounds() {
+  assert(CodeGenDataThinLTOTwoRounds);
+  if (auto EC = llvm::sys::fs::createUniqueDirectory(
+  "cgdata", CodeGenDataThinLTOTwoRoundsPath))
+report_fatal_error(Twine("Failed to create directory: ") + EC.message());
+}
+
+void saveModuleForTwoRounds(const Module &TheModule, unsigned Task) {
+  assert(sys::fs::is_directory(CodeGenDataThinLTOTwoRoundsPath));
+  std::string Path = getPath(CodeGenDataThinLTOTwoRoundsPath, Task);
+  std::error_code EC;
+  raw_fd_ostream OS(Path, EC, sys::fs::OpenFlags::OF_None);
+  if (EC)
+report_fatal_error(Twine("Failed to open ") + Path +
+   " to save optimized bitcode: " + EC.message());
+  WriteBitcodeToFile(TheModule, OS, /* ShouldPreserveUseListOrder */ true);
+}
+
+std::unique_ptr loadModuleForTwoRounds(BitcodeModule &OrigModule,
+   unsigned Task,
+   LLVMContext &Context) {
+  assert(sys::fs::is_directory(CodeGenDataThinLTOTwoRoundsPath));
+  std::string Path = getPath(CodeGenDataThinLTO

[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)

2024-10-18 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com ready_for_review 
https://github.com/llvm/llvm-project/pull/112638
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)

2024-10-18 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com edited 
https://github.com/llvm/llvm-project/pull/112638
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)

2024-10-19 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com updated 
https://github.com/llvm/llvm-project/pull/112638

>From 6225d74229d41068c57109a24b063f6fcba13985 Mon Sep 17 00:00:00 2001
From: Kyungwoo Lee 
Date: Wed, 16 Oct 2024 17:09:07 -0700
Subject: [PATCH 1/3] [StructuralHash] Support Differences

This comutes a structural hash while allowing for selective ignoring of
certain operands based on a custom function that is provided.
Instead of a single hash value, it now returns FunctionHashInfo which
includes a hash value, an instruction mapping, and a map to track the
operand location and its corresponding hash value that is ignored.
---
 llvm/include/llvm/IR/StructuralHash.h|  46 ++
 llvm/lib/IR/StructuralHash.cpp   | 188 +--
 llvm/unittests/IR/StructuralHashTest.cpp |  55 +++
 3 files changed, 275 insertions(+), 14 deletions(-)

diff --git a/llvm/include/llvm/IR/StructuralHash.h 
b/llvm/include/llvm/IR/StructuralHash.h
index aa292bc3446799..bc82c204c4d1f6 100644
--- a/llvm/include/llvm/IR/StructuralHash.h
+++ b/llvm/include/llvm/IR/StructuralHash.h
@@ -14,7 +14,9 @@
 #ifndef LLVM_IR_STRUCTURALHASH_H
 #define LLVM_IR_STRUCTURALHASH_H
 
+#include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/StableHashing.h"
+#include "llvm/IR/Instruction.h"
 #include 
 
 namespace llvm {
@@ -23,6 +25,7 @@ class Function;
 class Module;
 
 using IRHash = stable_hash;
+using OpndHash = stable_hash;
 
 /// Returns a hash of the function \p F.
 /// \param F The function to hash.
@@ -37,6 +40,49 @@ IRHash StructuralHash(const Function &F, bool DetailedHash = 
false);
 /// composed the module hash.
 IRHash StructuralHash(const Module &M, bool DetailedHash = false);
 
+/// The pair of an instruction index and a operand index.
+using IndexPair = std::pair;
+
+/// A map from an instruction index to an instruction pointer.
+using IndexInstrMap = MapVector;
+
+/// A map from an IndexPair to an OpndHash.
+using IndexOperandHashMapType = DenseMap;
+
+/// A function that takes an instruction and an operand index and returns true
+/// if the operand should be ignored in the function hash computation.
+using IgnoreOperandFunc = std::function;
+
+struct FunctionHashInfo {
+  /// A hash value representing the structural content of the function
+  IRHash FunctionHash;
+  /// A mapping from instruction indices to instruction pointers
+  std::unique_ptr IndexInstruction;
+  /// A mapping from pairs of instruction indices and operand indices
+  /// to the hashes of the operands. This can be used to analyze or
+  /// reconstruct the differences in ignored operands
+  std::unique_ptr IndexOperandHashMap;
+
+  FunctionHashInfo(IRHash FuntionHash,
+   std::unique_ptr IndexInstruction,
+   std::unique_ptr 
IndexOperandHashMap)
+  : FunctionHash(FuntionHash),
+IndexInstruction(std::move(IndexInstruction)),
+IndexOperandHashMap(std::move(IndexOperandHashMap)) {}
+};
+
+/// Computes a structural hash of a given function, considering the structure
+/// and content of the function's instructions while allowing for selective
+/// ignoring of certain operands based on custom criteria. This hash can be 
used
+/// to identify functions that are structurally similar or identical, which is
+/// useful in optimizations, deduplication, or analysis tasks.
+/// \param F The function to hash.
+/// \param IgnoreOp A callable that takes an instruction and an operand index,
+/// and returns true if the operand should be ignored in the hash computation.
+/// \return A FunctionHashInfo structure
+FunctionHashInfo StructuralHashWithDifferences(const Function &F,
+   IgnoreOperandFunc IgnoreOp);
+
 } // end namespace llvm
 
 #endif
diff --git a/llvm/lib/IR/StructuralHash.cpp b/llvm/lib/IR/StructuralHash.cpp
index a1fabab77d52b2..6e0af666010a05 100644
--- a/llvm/lib/IR/StructuralHash.cpp
+++ b/llvm/lib/IR/StructuralHash.cpp
@@ -28,6 +28,19 @@ class StructuralHashImpl {
 
   bool DetailedHash;
 
+  /// IgnoreOp is a function that returns true if the operand should be 
ignored.
+  IgnoreOperandFunc IgnoreOp = nullptr;
+  /// A mapping from instruction indices to instruction pointers.
+  /// The index represents the position of an instruction based on the order in
+  /// which it is first encountered.
+  std::unique_ptr IndexInstruction = nullptr;
+  /// A mapping from pairs of instruction indices and operand indices
+  /// to the hashes of the operands.
+  std::unique_ptr IndexOperandHashMap = nullptr;
+
+  /// Assign a unique ID to each Value in the order they are first seen.
+  DenseMap ValueToId;
+
   // This will produce different values on 32-bit and 64-bit systens as
   // hash_combine returns a size_t. However, this is only used for
   // detailed hashing which, in-tree, only needs to distinguish between
@@ -47,24 +60,140 @@ class StructuralHashImpl {
 
 public:
   StructuralHashImpl() = delete;
-  explicit StructuralHashImpl(bool DetailedHash) : DetailedHas

[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)

2024-10-19 Thread Kyungwoo Lee via llvm-branch-commits


@@ -47,24 +60,140 @@ class StructuralHashImpl {
 
 public:
   StructuralHashImpl() = delete;
-  explicit StructuralHashImpl(bool DetailedHash) : DetailedHash(DetailedHash) 
{}
+  explicit StructuralHashImpl(bool DetailedHash,
+  IgnoreOperandFunc IgnoreOp = nullptr)
+  : DetailedHash(DetailedHash), IgnoreOp(IgnoreOp) {
+if (IgnoreOp) {
+  IndexInstruction = std::make_unique();
+  IndexOperandHashMap = std::make_unique();
+}
+  }
 
-  stable_hash hashConstant(Constant *C) {
+  stable_hash hashAPInt(const APInt &I) {
 SmallVector Hashes;
-// TODO: hashArbitaryType() is not stable.
-if (ConstantInt *ConstInt = dyn_cast(C)) {
-  Hashes.emplace_back(hashArbitaryType(ConstInt->getValue()));
-} else if (ConstantFP *ConstFP = dyn_cast(C)) {
-  Hashes.emplace_back(hashArbitaryType(ConstFP->getValue()));
-} else if (Function *Func = dyn_cast(C))
-  // Hashing the name will be deterministic as LLVM's hashing 
infrastructure
-  // has explicit support for hashing strings and will not simply hash
-  // the pointer.
-  Hashes.emplace_back(hashArbitaryType(Func->getName()));
+Hashes.emplace_back(I.getBitWidth());
+for (unsigned J = 0; J < I.getNumWords(); ++J)
+  Hashes.emplace_back((I.getRawData())[J]);
+return stable_hash_combine(Hashes);
+  }
 
+  stable_hash hashAPFloat(const APFloat &F) {
+SmallVector Hashes;
+const fltSemantics &S = F.getSemantics();
+Hashes.emplace_back(APFloat::semanticsPrecision(S));
+Hashes.emplace_back(APFloat::semanticsMaxExponent(S));
+Hashes.emplace_back(APFloat::semanticsMinExponent(S));
+Hashes.emplace_back(APFloat::semanticsSizeInBits(S));
+Hashes.emplace_back(hashAPInt(F.bitcastToAPInt()));
 return stable_hash_combine(Hashes);
   }
 
+  stable_hash hashGlobalValue(const GlobalValue *GV) {
+if (!GV->hasName())
+  return 0;
+return stable_hash_name(GV->getName());
+  }
+
+  // Compute a hash for a Constant. This function is logically similar to
+  // FunctionComparator::cmpConstants() in FunctionComparator.cpp, but here
+  // we're interested in computing a hash rather than comparing two Constants.
+  // Some of the logic is simplified, e.g, we don't expand GEPOperator.
+  stable_hash hashConstant(Constant *C) {
+SmallVector Hashes;
+
+Type *Ty = C->getType();
+Hashes.emplace_back(hashType(Ty));
+
+if (C->isNullValue()) {
+  Hashes.emplace_back(static_cast('N'));
+  return stable_hash_combine(Hashes);
+}
+
+auto *G = dyn_cast(C);
+if (G) {
+  Hashes.emplace_back(hashGlobalValue(G));
+  return stable_hash_combine(Hashes);
+}
+
+if (const auto *Seq = dyn_cast(C)) {
+  Hashes.emplace_back(xxh3_64bits(Seq->getRawDataValues()));
+  return stable_hash_combine(Hashes);
+}
+
+switch (C->getValueID()) {
+case Value::UndefValueVal:
+case Value::PoisonValueVal:
+case Value::ConstantTokenNoneVal: {
+  return stable_hash_combine(Hashes);
+}
+case Value::ConstantIntVal: {
+  const APInt &Int = cast(C)->getValue();
+  Hashes.emplace_back(hashAPInt(Int));
+  return stable_hash_combine(Hashes);
+}
+case Value::ConstantFPVal: {
+  const APFloat &APF = cast(C)->getValueAPF();
+  Hashes.emplace_back(hashAPFloat(APF));
+  return stable_hash_combine(Hashes);
+}
+case Value::ConstantArrayVal: {
+  const ConstantArray *A = cast(C);
+  uint64_t NumElements = cast(Ty)->getNumElements();
+  Hashes.emplace_back(NumElements);

kyulee-com wrote:

Yeah. We could remove the count.

https://github.com/llvm/llvm-project/pull/112638
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)

2024-10-19 Thread Kyungwoo Lee via llvm-branch-commits


@@ -47,24 +60,140 @@ class StructuralHashImpl {
 
 public:
   StructuralHashImpl() = delete;
-  explicit StructuralHashImpl(bool DetailedHash) : DetailedHash(DetailedHash) 
{}
+  explicit StructuralHashImpl(bool DetailedHash,
+  IgnoreOperandFunc IgnoreOp = nullptr)
+  : DetailedHash(DetailedHash), IgnoreOp(IgnoreOp) {
+if (IgnoreOp) {
+  IndexInstruction = std::make_unique();
+  IndexOperandHashMap = std::make_unique();
+}
+  }
 
-  stable_hash hashConstant(Constant *C) {
+  stable_hash hashAPInt(const APInt &I) {
 SmallVector Hashes;
-// TODO: hashArbitaryType() is not stable.
-if (ConstantInt *ConstInt = dyn_cast(C)) {
-  Hashes.emplace_back(hashArbitaryType(ConstInt->getValue()));
-} else if (ConstantFP *ConstFP = dyn_cast(C)) {
-  Hashes.emplace_back(hashArbitaryType(ConstFP->getValue()));
-} else if (Function *Func = dyn_cast(C))
-  // Hashing the name will be deterministic as LLVM's hashing 
infrastructure
-  // has explicit support for hashing strings and will not simply hash
-  // the pointer.
-  Hashes.emplace_back(hashArbitaryType(Func->getName()));
+Hashes.emplace_back(I.getBitWidth());
+for (unsigned J = 0; J < I.getNumWords(); ++J)
+  Hashes.emplace_back((I.getRawData())[J]);
+return stable_hash_combine(Hashes);
+  }
 
+  stable_hash hashAPFloat(const APFloat &F) {
+SmallVector Hashes;
+const fltSemantics &S = F.getSemantics();
+Hashes.emplace_back(APFloat::semanticsPrecision(S));
+Hashes.emplace_back(APFloat::semanticsMaxExponent(S));
+Hashes.emplace_back(APFloat::semanticsMinExponent(S));
+Hashes.emplace_back(APFloat::semanticsSizeInBits(S));
+Hashes.emplace_back(hashAPInt(F.bitcastToAPInt()));
 return stable_hash_combine(Hashes);
   }
 
+  stable_hash hashGlobalValue(const GlobalValue *GV) {
+if (!GV->hasName())
+  return 0;
+return stable_hash_name(GV->getName());

kyulee-com wrote:

`stable_hash_name` itself already handles it by calling `get_stable_name`.
https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ADT/StableHashing.h#L55-L74

https://github.com/llvm/llvm-project/pull/112638
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)

2024-10-19 Thread Kyungwoo Lee via llvm-branch-commits


@@ -47,24 +60,140 @@ class StructuralHashImpl {
 
 public:
   StructuralHashImpl() = delete;
-  explicit StructuralHashImpl(bool DetailedHash) : DetailedHash(DetailedHash) 
{}
+  explicit StructuralHashImpl(bool DetailedHash,
+  IgnoreOperandFunc IgnoreOp = nullptr)
+  : DetailedHash(DetailedHash), IgnoreOp(IgnoreOp) {
+if (IgnoreOp) {
+  IndexInstruction = std::make_unique();
+  IndexOperandHashMap = std::make_unique();
+}
+  }
 
-  stable_hash hashConstant(Constant *C) {
+  stable_hash hashAPInt(const APInt &I) {
 SmallVector Hashes;
-// TODO: hashArbitaryType() is not stable.
-if (ConstantInt *ConstInt = dyn_cast(C)) {
-  Hashes.emplace_back(hashArbitaryType(ConstInt->getValue()));
-} else if (ConstantFP *ConstFP = dyn_cast(C)) {
-  Hashes.emplace_back(hashArbitaryType(ConstFP->getValue()));
-} else if (Function *Func = dyn_cast(C))
-  // Hashing the name will be deterministic as LLVM's hashing 
infrastructure
-  // has explicit support for hashing strings and will not simply hash
-  // the pointer.
-  Hashes.emplace_back(hashArbitaryType(Func->getName()));
+Hashes.emplace_back(I.getBitWidth());
+for (unsigned J = 0; J < I.getNumWords(); ++J)
+  Hashes.emplace_back((I.getRawData())[J]);
+return stable_hash_combine(Hashes);
+  }
 
+  stable_hash hashAPFloat(const APFloat &F) {
+SmallVector Hashes;
+const fltSemantics &S = F.getSemantics();
+Hashes.emplace_back(APFloat::semanticsPrecision(S));
+Hashes.emplace_back(APFloat::semanticsMaxExponent(S));
+Hashes.emplace_back(APFloat::semanticsMinExponent(S));
+Hashes.emplace_back(APFloat::semanticsSizeInBits(S));
+Hashes.emplace_back(hashAPInt(F.bitcastToAPInt()));
 return stable_hash_combine(Hashes);
   }
 
+  stable_hash hashGlobalValue(const GlobalValue *GV) {
+if (!GV->hasName())
+  return 0;
+return stable_hash_name(GV->getName());
+  }
+
+  // Compute a hash for a Constant. This function is logically similar to
+  // FunctionComparator::cmpConstants() in FunctionComparator.cpp, but here
+  // we're interested in computing a hash rather than comparing two Constants.
+  // Some of the logic is simplified, e.g, we don't expand GEPOperator.
+  stable_hash hashConstant(Constant *C) {
+SmallVector Hashes;
+
+Type *Ty = C->getType();
+Hashes.emplace_back(hashType(Ty));
+
+if (C->isNullValue()) {
+  Hashes.emplace_back(static_cast('N'));
+  return stable_hash_combine(Hashes);
+}
+
+auto *G = dyn_cast(C);
+if (G) {
+  Hashes.emplace_back(hashGlobalValue(G));
+  return stable_hash_combine(Hashes);
+}
+
+if (const auto *Seq = dyn_cast(C)) {
+  Hashes.emplace_back(xxh3_64bits(Seq->getRawDataValues()));
+  return stable_hash_combine(Hashes);
+}
+
+switch (C->getValueID()) {
+case Value::UndefValueVal:
+case Value::PoisonValueVal:
+case Value::ConstantTokenNoneVal: {
+  return stable_hash_combine(Hashes);
+}
+case Value::ConstantIntVal: {
+  const APInt &Int = cast(C)->getValue();
+  Hashes.emplace_back(hashAPInt(Int));
+  return stable_hash_combine(Hashes);
+}
+case Value::ConstantFPVal: {
+  const APFloat &APF = cast(C)->getValueAPF();
+  Hashes.emplace_back(hashAPFloat(APF));
+  return stable_hash_combine(Hashes);
+}
+case Value::ConstantArrayVal: {
+  const ConstantArray *A = cast(C);
+  uint64_t NumElements = cast(Ty)->getNumElements();
+  Hashes.emplace_back(NumElements);
+  for (auto &Op : A->operands()) {
+auto H = hashConstant(cast(Op));
+Hashes.emplace_back(H);
+  }
+  return stable_hash_combine(Hashes);
+}
+case Value::ConstantStructVal: {
+  const ConstantStruct *S = cast(C);
+  unsigned NumElements = cast(Ty)->getNumElements();
+  Hashes.emplace_back(NumElements);
+  for (auto &Op : S->operands()) {
+auto H = hashConstant(cast(Op));
+Hashes.emplace_back(H);
+  }
+  return stable_hash_combine(Hashes);
+}
+case Value::ConstantVectorVal: {
+  const ConstantVector *V = cast(C);
+  unsigned NumElements = cast(Ty)->getNumElements();
+  Hashes.emplace_back(NumElements);
+  for (auto &Op : V->operands()) {
+auto H = hashConstant(cast(Op));
+Hashes.emplace_back(H);
+  }
+  return stable_hash_combine(Hashes);
+}
+case Value::ConstantExprVal: {
+  const ConstantExpr *E = cast(C);
+  unsigned NumOperands = E->getNumOperands();
+  Hashes.emplace_back(NumOperands);
+  for (auto &Op : E->operands()) {
+auto H = hashConstant(cast(Op));
+Hashes.emplace_back(H);
+  }
+  return stable_hash_combine(Hashes);
+}
+case Value::BlockAddressVal: {
+  const BlockAddress *BA = cast(C);
+  auto H = hashGlobalValue(BA->getFunction());
+  Hashes.emplace_back(H);
+  return stable_hash_co

[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)

2024-10-19 Thread Kyungwoo Lee via llvm-branch-commits

kyulee-com wrote:

> IIRC we have several lit tests that cover structural hash, shouldn't we have 
> a new test there that uses the new functionality?

Extended the existing `StructuralHashPrinterPass` with `Options`, and updated 
the corresponding lit test accordingly.

https://github.com/llvm/llvm-project/pull/112638
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)

2024-10-19 Thread Kyungwoo Lee via llvm-branch-commits


@@ -47,24 +60,140 @@ class StructuralHashImpl {
 
 public:
   StructuralHashImpl() = delete;
-  explicit StructuralHashImpl(bool DetailedHash) : DetailedHash(DetailedHash) 
{}
+  explicit StructuralHashImpl(bool DetailedHash,
+  IgnoreOperandFunc IgnoreOp = nullptr)
+  : DetailedHash(DetailedHash), IgnoreOp(IgnoreOp) {
+if (IgnoreOp) {
+  IndexInstruction = std::make_unique();
+  IndexOperandHashMap = std::make_unique();
+}
+  }
 
-  stable_hash hashConstant(Constant *C) {
+  stable_hash hashAPInt(const APInt &I) {
 SmallVector Hashes;
-// TODO: hashArbitaryType() is not stable.
-if (ConstantInt *ConstInt = dyn_cast(C)) {
-  Hashes.emplace_back(hashArbitaryType(ConstInt->getValue()));
-} else if (ConstantFP *ConstFP = dyn_cast(C)) {
-  Hashes.emplace_back(hashArbitaryType(ConstFP->getValue()));
-} else if (Function *Func = dyn_cast(C))
-  // Hashing the name will be deterministic as LLVM's hashing 
infrastructure
-  // has explicit support for hashing strings and will not simply hash
-  // the pointer.
-  Hashes.emplace_back(hashArbitaryType(Func->getName()));
+Hashes.emplace_back(I.getBitWidth());
+for (unsigned J = 0; J < I.getNumWords(); ++J)
+  Hashes.emplace_back((I.getRawData())[J]);
+return stable_hash_combine(Hashes);
+  }
 
+  stable_hash hashAPFloat(const APFloat &F) {
+SmallVector Hashes;
+const fltSemantics &S = F.getSemantics();
+Hashes.emplace_back(APFloat::semanticsPrecision(S));
+Hashes.emplace_back(APFloat::semanticsMaxExponent(S));
+Hashes.emplace_back(APFloat::semanticsMinExponent(S));
+Hashes.emplace_back(APFloat::semanticsSizeInBits(S));
+Hashes.emplace_back(hashAPInt(F.bitcastToAPInt()));

kyulee-com wrote:

yeah. we could simplify it.

https://github.com/llvm/llvm-project/pull/112638
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)

2024-10-19 Thread Kyungwoo Lee via llvm-branch-commits


@@ -47,24 +60,140 @@ class StructuralHashImpl {
 
 public:
   StructuralHashImpl() = delete;
-  explicit StructuralHashImpl(bool DetailedHash) : DetailedHash(DetailedHash) 
{}
+  explicit StructuralHashImpl(bool DetailedHash,
+  IgnoreOperandFunc IgnoreOp = nullptr)
+  : DetailedHash(DetailedHash), IgnoreOp(IgnoreOp) {
+if (IgnoreOp) {
+  IndexInstruction = std::make_unique();
+  IndexOperandHashMap = std::make_unique();
+}
+  }
 
-  stable_hash hashConstant(Constant *C) {
+  stable_hash hashAPInt(const APInt &I) {
 SmallVector Hashes;
-// TODO: hashArbitaryType() is not stable.
-if (ConstantInt *ConstInt = dyn_cast(C)) {
-  Hashes.emplace_back(hashArbitaryType(ConstInt->getValue()));
-} else if (ConstantFP *ConstFP = dyn_cast(C)) {
-  Hashes.emplace_back(hashArbitaryType(ConstFP->getValue()));
-} else if (Function *Func = dyn_cast(C))
-  // Hashing the name will be deterministic as LLVM's hashing 
infrastructure
-  // has explicit support for hashing strings and will not simply hash
-  // the pointer.
-  Hashes.emplace_back(hashArbitaryType(Func->getName()));
+Hashes.emplace_back(I.getBitWidth());
+for (unsigned J = 0; J < I.getNumWords(); ++J)
+  Hashes.emplace_back((I.getRawData())[J]);
+return stable_hash_combine(Hashes);
+  }
 
+  stable_hash hashAPFloat(const APFloat &F) {
+SmallVector Hashes;
+const fltSemantics &S = F.getSemantics();
+Hashes.emplace_back(APFloat::semanticsPrecision(S));
+Hashes.emplace_back(APFloat::semanticsMaxExponent(S));
+Hashes.emplace_back(APFloat::semanticsMinExponent(S));
+Hashes.emplace_back(APFloat::semanticsSizeInBits(S));
+Hashes.emplace_back(hashAPInt(F.bitcastToAPInt()));
 return stable_hash_combine(Hashes);
   }
 
+  stable_hash hashGlobalValue(const GlobalValue *GV) {
+if (!GV->hasName())
+  return 0;
+return stable_hash_name(GV->getName());
+  }
+
+  // Compute a hash for a Constant. This function is logically similar to
+  // FunctionComparator::cmpConstants() in FunctionComparator.cpp, but here
+  // we're interested in computing a hash rather than comparing two Constants.
+  // Some of the logic is simplified, e.g, we don't expand GEPOperator.
+  stable_hash hashConstant(Constant *C) {
+SmallVector Hashes;
+
+Type *Ty = C->getType();
+Hashes.emplace_back(hashType(Ty));
+
+if (C->isNullValue()) {
+  Hashes.emplace_back(static_cast('N'));
+  return stable_hash_combine(Hashes);
+}
+
+auto *G = dyn_cast(C);
+if (G) {
+  Hashes.emplace_back(hashGlobalValue(G));
+  return stable_hash_combine(Hashes);
+}
+
+if (const auto *Seq = dyn_cast(C)) {
+  Hashes.emplace_back(xxh3_64bits(Seq->getRawDataValues()));
+  return stable_hash_combine(Hashes);
+}
+
+switch (C->getValueID()) {
+case Value::UndefValueVal:
+case Value::PoisonValueVal:
+case Value::ConstantTokenNoneVal: {
+  return stable_hash_combine(Hashes);
+}
+case Value::ConstantIntVal: {
+  const APInt &Int = cast(C)->getValue();
+  Hashes.emplace_back(hashAPInt(Int));
+  return stable_hash_combine(Hashes);
+}
+case Value::ConstantFPVal: {
+  const APFloat &APF = cast(C)->getValueAPF();
+  Hashes.emplace_back(hashAPFloat(APF));
+  return stable_hash_combine(Hashes);
+}
+case Value::ConstantArrayVal: {
+  const ConstantArray *A = cast(C);
+  uint64_t NumElements = cast(Ty)->getNumElements();
+  Hashes.emplace_back(NumElements);
+  for (auto &Op : A->operands()) {
+auto H = hashConstant(cast(Op));
+Hashes.emplace_back(H);
+  }
+  return stable_hash_combine(Hashes);
+}
+case Value::ConstantStructVal: {
+  const ConstantStruct *S = cast(C);
+  unsigned NumElements = cast(Ty)->getNumElements();
+  Hashes.emplace_back(NumElements);
+  for (auto &Op : S->operands()) {
+auto H = hashConstant(cast(Op));
+Hashes.emplace_back(H);
+  }
+  return stable_hash_combine(Hashes);

kyulee-com wrote:

Most cases are simply grouped.

https://github.com/llvm/llvm-project/pull/112638
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)

2024-10-19 Thread Kyungwoo Lee via llvm-branch-commits


@@ -100,8 +233,20 @@ class StructuralHashImpl {
 if (const auto *ComparisonInstruction = dyn_cast(&Inst))
   Hashes.emplace_back(ComparisonInstruction->getPredicate());
 
-for (const auto &Op : Inst.operands())
-  Hashes.emplace_back(hashOperand(Op));
+unsigned InstIdx = 0;
+if (IndexInstruction) {
+  InstIdx = IndexInstruction->size();
+  IndexInstruction->insert({InstIdx, const_cast(&Inst)});

kyulee-com wrote:

Instruction is inserted once by design in this pass. In fact, this map 
`IndexInstruction` itself can't catch the duplication as the key is `index`, 
not `Instruction *`.  Anyhow, replaced `insert` by `trace_emplace` for 
efficiency.

https://github.com/llvm/llvm-project/pull/112638
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)

2024-10-19 Thread Kyungwoo Lee via llvm-branch-commits

kyulee-com wrote:

The test failure `TableGen/x86-fold-tables.td` seems unrelated.

https://github.com/llvm/llvm-project/pull/112638
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [CGData] Global Merge Functions (PR #112671)

2024-10-18 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com updated 
https://github.com/llvm/llvm-project/pull/112671

>From ded5771bb4ff7c8fd5401b4efe0af988539a8162 Mon Sep 17 00:00:00 2001
From: Kyungwoo Lee 
Date: Fri, 30 Aug 2024 00:09:09 -0700
Subject: [PATCH 1/2] [CGData] Global Merge Functions

---
 llvm/include/llvm/CGData/CodeGenData.h|  11 +
 llvm/include/llvm/InitializePasses.h  |   1 +
 llvm/include/llvm/LinkAllPasses.h |   1 +
 llvm/include/llvm/Passes/CodeGenPassBuilder.h |   1 +
 llvm/include/llvm/Transforms/IPO.h|   2 +
 .../Transforms/IPO/GlobalMergeFunctions.h |  77 ++
 llvm/lib/CodeGen/TargetPassConfig.cpp |   3 +
 llvm/lib/LTO/LTO.cpp  |   1 +
 llvm/lib/Transforms/IPO/CMakeLists.txt|   2 +
 .../Transforms/IPO/GlobalMergeFunctions.cpp   | 687 ++
 .../ThinLTO/AArch64/cgdata-merge-local.ll |  62 ++
 .../test/ThinLTO/AArch64/cgdata-merge-read.ll |  82 +++
 .../AArch64/cgdata-merge-two-rounds.ll|  68 ++
 .../ThinLTO/AArch64/cgdata-merge-write.ll |  97 +++
 llvm/tools/llvm-lto2/CMakeLists.txt   |   1 +
 llvm/tools/llvm-lto2/llvm-lto2.cpp|   6 +
 16 files changed, 1102 insertions(+)
 create mode 100644 llvm/include/llvm/Transforms/IPO/GlobalMergeFunctions.h
 create mode 100644 llvm/lib/Transforms/IPO/GlobalMergeFunctions.cpp
 create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-local.ll
 create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-read.ll
 create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-two-rounds.ll
 create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-write.ll

diff --git a/llvm/include/llvm/CGData/CodeGenData.h 
b/llvm/include/llvm/CGData/CodeGenData.h
index 5d7c74725ccef1..da0e412f2a0e03 100644
--- a/llvm/include/llvm/CGData/CodeGenData.h
+++ b/llvm/include/llvm/CGData/CodeGenData.h
@@ -145,6 +145,9 @@ class CodeGenData {
   const OutlinedHashTree *getOutlinedHashTree() {
 return PublishedHashTree.get();
   }
+  const StableFunctionMap *getStableFunctionMap() {
+return PublishedStableFunctionMap.get();
+  }
 
   /// Returns true if we should write codegen data.
   bool emitCGData() { return EmitCGData; }
@@ -169,10 +172,18 @@ inline bool hasOutlinedHashTree() {
   return CodeGenData::getInstance().hasOutlinedHashTree();
 }
 
+inline bool hasStableFunctionMap() {
+  return CodeGenData::getInstance().hasStableFunctionMap();
+}
+
 inline const OutlinedHashTree *getOutlinedHashTree() {
   return CodeGenData::getInstance().getOutlinedHashTree();
 }
 
+inline const StableFunctionMap *getStableFunctionMap() {
+  return CodeGenData::getInstance().getStableFunctionMap();
+}
+
 inline bool emitCGData() { return CodeGenData::getInstance().emitCGData(); }
 
 inline void
diff --git a/llvm/include/llvm/InitializePasses.h 
b/llvm/include/llvm/InitializePasses.h
index 4352099d6dbb99..9aa36d5bb7f801 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -123,6 +123,7 @@ void initializeGCEmptyBasicBlocksPass(PassRegistry &);
 void initializeGCMachineCodeAnalysisPass(PassRegistry &);
 void initializeGCModuleInfoPass(PassRegistry &);
 void initializeGVNLegacyPassPass(PassRegistry &);
+void initializeGlobalMergeFuncPass(PassRegistry &);
 void initializeGlobalMergePass(PassRegistry &);
 void initializeGlobalsAAWrapperPassPass(PassRegistry &);
 void initializeHardwareLoopsLegacyPass(PassRegistry &);
diff --git a/llvm/include/llvm/LinkAllPasses.h 
b/llvm/include/llvm/LinkAllPasses.h
index 92b59a66567c95..ea3609a2b4bc71 100644
--- a/llvm/include/llvm/LinkAllPasses.h
+++ b/llvm/include/llvm/LinkAllPasses.h
@@ -79,6 +79,7 @@ struct ForcePassLinking {
 (void)llvm::createDomOnlyViewerWrapperPassPass();
 (void)llvm::createDomViewerWrapperPassPass();
 (void)llvm::createAlwaysInlinerLegacyPass();
+(void)llvm::createGlobalMergeFuncPass();
 (void)llvm::createGlobalsAAWrapperPass();
 (void)llvm::createInstSimplifyLegacyPass();
 (void)llvm::createInstructionCombiningPass();
diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h 
b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
index 13bc4700d87029..96b5b815132bc0 100644
--- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h
+++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
@@ -74,6 +74,7 @@
 #include "llvm/Target/CGPassBuilderOption.h"
 #include "llvm/Target/TargetMachine.h"
 #include "llvm/Transforms/CFGuard.h"
+#include "llvm/Transforms/IPO/GlobalMergeFunctions.h"
 #include "llvm/Transforms/Scalar/ConstantHoisting.h"
 #include "llvm/Transforms/Scalar/LoopPassManager.h"
 #include "llvm/Transforms/Scalar/LoopStrengthReduce.h"
diff --git a/llvm/include/llvm/Transforms/IPO.h 
b/llvm/include/llvm/Transforms/IPO.h
index ee0e35aa618325..86a8654f56997c 100644
--- a/llvm/include/llvm/Transforms/IPO.h
+++ b/llvm/include/llvm/Transforms/IPO.h
@@ -55,6 +55,8 @@ enum class PassSummaryAction {
   Export, ///< Export information to summary.
 };
 
+Pass *createGlobalMergeF

[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)

2024-10-21 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com updated 
https://github.com/llvm/llvm-project/pull/112638

>From 6225d74229d41068c57109a24b063f6fcba13985 Mon Sep 17 00:00:00 2001
From: Kyungwoo Lee 
Date: Wed, 16 Oct 2024 17:09:07 -0700
Subject: [PATCH 1/4] [StructuralHash] Support Differences

This comutes a structural hash while allowing for selective ignoring of
certain operands based on a custom function that is provided.
Instead of a single hash value, it now returns FunctionHashInfo which
includes a hash value, an instruction mapping, and a map to track the
operand location and its corresponding hash value that is ignored.
---
 llvm/include/llvm/IR/StructuralHash.h|  46 ++
 llvm/lib/IR/StructuralHash.cpp   | 188 +--
 llvm/unittests/IR/StructuralHashTest.cpp |  55 +++
 3 files changed, 275 insertions(+), 14 deletions(-)

diff --git a/llvm/include/llvm/IR/StructuralHash.h 
b/llvm/include/llvm/IR/StructuralHash.h
index aa292bc3446799..bc82c204c4d1f6 100644
--- a/llvm/include/llvm/IR/StructuralHash.h
+++ b/llvm/include/llvm/IR/StructuralHash.h
@@ -14,7 +14,9 @@
 #ifndef LLVM_IR_STRUCTURALHASH_H
 #define LLVM_IR_STRUCTURALHASH_H
 
+#include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/StableHashing.h"
+#include "llvm/IR/Instruction.h"
 #include 
 
 namespace llvm {
@@ -23,6 +25,7 @@ class Function;
 class Module;
 
 using IRHash = stable_hash;
+using OpndHash = stable_hash;
 
 /// Returns a hash of the function \p F.
 /// \param F The function to hash.
@@ -37,6 +40,49 @@ IRHash StructuralHash(const Function &F, bool DetailedHash = 
false);
 /// composed the module hash.
 IRHash StructuralHash(const Module &M, bool DetailedHash = false);
 
+/// The pair of an instruction index and a operand index.
+using IndexPair = std::pair;
+
+/// A map from an instruction index to an instruction pointer.
+using IndexInstrMap = MapVector;
+
+/// A map from an IndexPair to an OpndHash.
+using IndexOperandHashMapType = DenseMap;
+
+/// A function that takes an instruction and an operand index and returns true
+/// if the operand should be ignored in the function hash computation.
+using IgnoreOperandFunc = std::function;
+
+struct FunctionHashInfo {
+  /// A hash value representing the structural content of the function
+  IRHash FunctionHash;
+  /// A mapping from instruction indices to instruction pointers
+  std::unique_ptr IndexInstruction;
+  /// A mapping from pairs of instruction indices and operand indices
+  /// to the hashes of the operands. This can be used to analyze or
+  /// reconstruct the differences in ignored operands
+  std::unique_ptr IndexOperandHashMap;
+
+  FunctionHashInfo(IRHash FuntionHash,
+   std::unique_ptr IndexInstruction,
+   std::unique_ptr 
IndexOperandHashMap)
+  : FunctionHash(FuntionHash),
+IndexInstruction(std::move(IndexInstruction)),
+IndexOperandHashMap(std::move(IndexOperandHashMap)) {}
+};
+
+/// Computes a structural hash of a given function, considering the structure
+/// and content of the function's instructions while allowing for selective
+/// ignoring of certain operands based on custom criteria. This hash can be 
used
+/// to identify functions that are structurally similar or identical, which is
+/// useful in optimizations, deduplication, or analysis tasks.
+/// \param F The function to hash.
+/// \param IgnoreOp A callable that takes an instruction and an operand index,
+/// and returns true if the operand should be ignored in the hash computation.
+/// \return A FunctionHashInfo structure
+FunctionHashInfo StructuralHashWithDifferences(const Function &F,
+   IgnoreOperandFunc IgnoreOp);
+
 } // end namespace llvm
 
 #endif
diff --git a/llvm/lib/IR/StructuralHash.cpp b/llvm/lib/IR/StructuralHash.cpp
index a1fabab77d52b2..6e0af666010a05 100644
--- a/llvm/lib/IR/StructuralHash.cpp
+++ b/llvm/lib/IR/StructuralHash.cpp
@@ -28,6 +28,19 @@ class StructuralHashImpl {
 
   bool DetailedHash;
 
+  /// IgnoreOp is a function that returns true if the operand should be 
ignored.
+  IgnoreOperandFunc IgnoreOp = nullptr;
+  /// A mapping from instruction indices to instruction pointers.
+  /// The index represents the position of an instruction based on the order in
+  /// which it is first encountered.
+  std::unique_ptr IndexInstruction = nullptr;
+  /// A mapping from pairs of instruction indices and operand indices
+  /// to the hashes of the operands.
+  std::unique_ptr IndexOperandHashMap = nullptr;
+
+  /// Assign a unique ID to each Value in the order they are first seen.
+  DenseMap ValueToId;
+
   // This will produce different values on 32-bit and 64-bit systens as
   // hash_combine returns a size_t. However, this is only used for
   // detailed hashing which, in-tree, only needs to distinguish between
@@ -47,24 +60,140 @@ class StructuralHashImpl {
 
 public:
   StructuralHashImpl() = delete;
-  explicit StructuralHashImpl(bool DetailedHash) : DetailedHas

[llvm-branch-commits] [llvm] [CGData][llvm-cgdata] Support for stable function map (PR #112664)

2024-10-16 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com updated 
https://github.com/llvm/llvm-project/pull/112664

>From 3b73ee558d57434ee1f8447ac2509db371d95d8f Mon Sep 17 00:00:00 2001
From: Kyungwoo Lee 
Date: Mon, 9 Sep 2024 19:38:05 -0700
Subject: [PATCH] [CGData][llvm-cgdata] Support for stable function map

This introduces a new cgdata format for stable function maps.
The raw data is embedded in the __llvm_merge section during compile time.
This data can be read and merged using the llvm-cgdata tool, into an indexed 
cgdata file. Consequently, the tool is now capable of handling either outlined 
hash trees, stable function maps, or both, as they are orthogonal.
---
 llvm/docs/CommandGuide/llvm-cgdata.rst| 16 ++--
 llvm/include/llvm/CGData/CodeGenData.h| 24 +-
 llvm/include/llvm/CGData/CodeGenData.inc  | 12 ++-
 llvm/include/llvm/CGData/CodeGenDataReader.h  | 29 ++-
 llvm/include/llvm/CGData/CodeGenDataWriter.h  | 17 +++-
 llvm/lib/CGData/CodeGenData.cpp   | 30 ---
 llvm/lib/CGData/CodeGenDataReader.cpp | 63 +-
 llvm/lib/CGData/CodeGenDataWriter.cpp | 30 ++-
 llvm/test/tools/llvm-cgdata/empty.test|  8 +-
 llvm/test/tools/llvm-cgdata/error.test| 13 +--
 .../merge-combined-funcmap-hashtree.test  | 66 +++
 .../llvm-cgdata/merge-funcmap-archive.test| 83 +++
 .../llvm-cgdata/merge-funcmap-concat.test | 78 +
 .../llvm-cgdata/merge-funcmap-double.test | 79 ++
 .../llvm-cgdata/merge-funcmap-single.test | 36 
 ...chive.test => merge-hashtree-archive.test} |  8 +-
 ...concat.test => merge-hashtree-concat.test} |  6 +-
 ...double.test => merge-hashtree-double.test} |  8 +-
 ...single.test => merge-hashtree-single.test} |  4 +-
 llvm/tools/llvm-cgdata/llvm-cgdata.cpp| 46 +++---
 20 files changed, 572 insertions(+), 84 deletions(-)
 create mode 100644 
llvm/test/tools/llvm-cgdata/merge-combined-funcmap-hashtree.test
 create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-archive.test
 create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-concat.test
 create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-double.test
 create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-single.test
 rename llvm/test/tools/llvm-cgdata/{merge-archive.test => 
merge-hashtree-archive.test} (91%)
 rename llvm/test/tools/llvm-cgdata/{merge-concat.test => 
merge-hashtree-concat.test} (93%)
 rename llvm/test/tools/llvm-cgdata/{merge-double.test => 
merge-hashtree-double.test} (90%)
 rename llvm/test/tools/llvm-cgdata/{merge-single.test => 
merge-hashtree-single.test} (92%)

diff --git a/llvm/docs/CommandGuide/llvm-cgdata.rst 
b/llvm/docs/CommandGuide/llvm-cgdata.rst
index f592e1508844ee..0670decd087e39 100644
--- a/llvm/docs/CommandGuide/llvm-cgdata.rst
+++ b/llvm/docs/CommandGuide/llvm-cgdata.rst
@@ -11,15 +11,13 @@ SYNOPSIS
 DESCRIPTION
 ---
 
-The :program:llvm-cgdata utility parses raw codegen data embedded
-in compiled binary files and merges them into a single .cgdata file.
-It can also inspect and manipulate .cgdata files.
-Currently, the tool supports saving and restoring outlined hash trees,
-enabling global function outlining across modules, allowing for more
-efficient function outlining in subsequent compilations.
-The design is extensible, allowing for the incorporation of additional
-codegen summaries and optimization techniques, such as global function
-merging, in the future.
+The :program:llvm-cgdata utility parses raw codegen data embedded in compiled
+binary files and merges them into a single .cgdata file. It can also inspect
+and manipulate .cgdata files. Currently, the tool supports saving and restoring
+outlined hash trees and stable function maps, allowing for more efficient
+function outlining and function merging across modules in subsequent
+compilations. The design is extensible, allowing for the incorporation of
+additional codegen summaries and optimization techniques.
 
 COMMANDS
 
diff --git a/llvm/include/llvm/CGData/CodeGenData.h 
b/llvm/include/llvm/CGData/CodeGenData.h
index 53550beeae1f83..5d7c74725ccef1 100644
--- a/llvm/include/llvm/CGData/CodeGenData.h
+++ b/llvm/include/llvm/CGData/CodeGenData.h
@@ -19,6 +19,7 @@
 #include "llvm/Bitcode/BitcodeReader.h"
 #include "llvm/CGData/OutlinedHashTree.h"
 #include "llvm/CGData/OutlinedHashTreeRecord.h"
+#include "llvm/CGData/StableFunctionMapRecord.h"
 #include "llvm/IR/Module.h"
 #include "llvm/Object/ObjectFile.h"
 #include "llvm/Support/Caching.h"
@@ -41,7 +42,9 @@ enum class CGDataKind {
   Unknown = 0x0,
   // A function outlining info.
   FunctionOutlinedHashTree = 0x1,
-  LLVM_MARK_AS_BITMASK_ENUM(/*LargestValue=*/FunctionOutlinedHashTree)
+  // A function merging info.
+  StableFunctionMergingMap = 0x2,
+  LLVM_MARK_AS_BITMASK_ENUM(/*LargestValue=*/StableFunctionMergingMap)
 };
 
 const std::error_category &cgdata_category();
@@ -108,6 +111,8 

[llvm-branch-commits] [lld] [llvm] [CGData][llvm-cgdata] Support for stable function map (PR #112664)

2024-10-17 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com updated 
https://github.com/llvm/llvm-project/pull/112664

>From f6fc25953b8f5109abb968c43ebc7d53f2e475db Mon Sep 17 00:00:00 2001
From: Kyungwoo Lee 
Date: Mon, 9 Sep 2024 19:38:05 -0700
Subject: [PATCH] [CGData][llvm-cgdata] Support for stable function map

This introduces a new cgdata format for stable function maps.
The raw data is embedded in the __llvm_merge section during compile time.
This data can be read and merged using the llvm-cgdata tool, into an indexed 
cgdata file. Consequently, the tool is now capable of handling either outlined 
hash trees, stable function maps, or both, as they are orthogonal.
---
 lld/test/MachO/cgdata-generate.s  |  6 +-
 llvm/docs/CommandGuide/llvm-cgdata.rst| 16 ++--
 llvm/include/llvm/CGData/CodeGenData.h| 24 +-
 llvm/include/llvm/CGData/CodeGenData.inc  | 12 ++-
 llvm/include/llvm/CGData/CodeGenDataReader.h  | 29 ++-
 llvm/include/llvm/CGData/CodeGenDataWriter.h  | 17 +++-
 llvm/lib/CGData/CodeGenData.cpp   | 30 ---
 llvm/lib/CGData/CodeGenDataReader.cpp | 63 +-
 llvm/lib/CGData/CodeGenDataWriter.cpp | 30 ++-
 llvm/test/tools/llvm-cgdata/empty.test|  8 +-
 llvm/test/tools/llvm-cgdata/error.test| 13 +--
 .../merge-combined-funcmap-hashtree.test  | 66 +++
 .../llvm-cgdata/merge-funcmap-archive.test| 83 +++
 .../llvm-cgdata/merge-funcmap-concat.test | 78 +
 .../llvm-cgdata/merge-funcmap-double.test | 79 ++
 .../llvm-cgdata/merge-funcmap-single.test | 36 
 ...chive.test => merge-hashtree-archive.test} |  8 +-
 ...concat.test => merge-hashtree-concat.test} |  6 +-
 ...double.test => merge-hashtree-double.test} |  8 +-
 ...single.test => merge-hashtree-single.test} |  4 +-
 llvm/tools/llvm-cgdata/llvm-cgdata.cpp| 48 ---
 21 files changed, 577 insertions(+), 87 deletions(-)
 create mode 100644 
llvm/test/tools/llvm-cgdata/merge-combined-funcmap-hashtree.test
 create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-archive.test
 create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-concat.test
 create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-double.test
 create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-single.test
 rename llvm/test/tools/llvm-cgdata/{merge-archive.test => 
merge-hashtree-archive.test} (91%)
 rename llvm/test/tools/llvm-cgdata/{merge-concat.test => 
merge-hashtree-concat.test} (93%)
 rename llvm/test/tools/llvm-cgdata/{merge-double.test => 
merge-hashtree-double.test} (90%)
 rename llvm/test/tools/llvm-cgdata/{merge-single.test => 
merge-hashtree-single.test} (92%)

diff --git a/lld/test/MachO/cgdata-generate.s b/lld/test/MachO/cgdata-generate.s
index 174df39d666c5d..f942ae07f64e0e 100644
--- a/lld/test/MachO/cgdata-generate.s
+++ b/lld/test/MachO/cgdata-generate.s
@@ -3,12 +3,12 @@
 
 # RUN: rm -rf %t; split-file %s %t
 
-# Synthesize raw cgdata without the header (24 byte) from the indexed cgdata.
+# Synthesize raw cgdata without the header (32 byte) from the indexed cgdata.
 # RUN: llvm-cgdata --convert --format binary %t/raw-1.cgtext -o %t/raw-1.cgdata
-# RUN: od -t x1 -j 24 -An %t/raw-1.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ 
/g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-1-bytes.txt
+# RUN: od -t x1 -j 32 -An %t/raw-1.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ 
/g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-1-bytes.txt
 # RUN: sed "s//$(cat %t/raw-1-bytes.txt)/g" %t/merge-template.s > 
%t/merge-1.s
 # RUN: llvm-cgdata --convert --format binary %t/raw-2.cgtext -o %t/raw-2.cgdata
-# RUN: od -t x1 -j 24 -An %t/raw-2.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ 
/g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-2-bytes.txt
+# RUN: od -t x1 -j 32 -An %t/raw-2.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ 
/g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-2-bytes.txt
 # RUN: sed "s//$(cat %t/raw-2-bytes.txt)/g" %t/merge-template.s > 
%t/merge-2.s
 
 # RUN: llvm-mc -filetype obj -triple arm64-apple-darwin %t/merge-1.s -o 
%t/merge-1.o
diff --git a/llvm/docs/CommandGuide/llvm-cgdata.rst 
b/llvm/docs/CommandGuide/llvm-cgdata.rst
index f592e1508844ee..0670decd087e39 100644
--- a/llvm/docs/CommandGuide/llvm-cgdata.rst
+++ b/llvm/docs/CommandGuide/llvm-cgdata.rst
@@ -11,15 +11,13 @@ SYNOPSIS
 DESCRIPTION
 ---
 
-The :program:llvm-cgdata utility parses raw codegen data embedded
-in compiled binary files and merges them into a single .cgdata file.
-It can also inspect and manipulate .cgdata files.
-Currently, the tool supports saving and restoring outlined hash trees,
-enabling global function outlining across modules, allowing for more
-efficient function outlining in subsequent compilations.
-The design is extensible, allowing for the incorporation of additional
-codegen summaries and optimization techniques, such as global function
-merging, in the futur

[llvm-branch-commits] [llvm] [CGData] Global Merge Functions (PR #112671)

2024-10-17 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com created 
https://github.com/llvm/llvm-project/pull/112671

None

>From 2a690c75924de5feadb4a582d76822b4d4d1d2cf Mon Sep 17 00:00:00 2001
From: Kyungwoo Lee 
Date: Fri, 30 Aug 2024 00:09:09 -0700
Subject: [PATCH] [CGData] Global Merge Functions

---
 llvm/include/llvm/CGData/CodeGenData.h|  11 +
 llvm/include/llvm/InitializePasses.h  |   1 +
 llvm/include/llvm/LinkAllPasses.h |   1 +
 llvm/include/llvm/Passes/CodeGenPassBuilder.h |   1 +
 llvm/include/llvm/Transforms/IPO.h|   2 +
 .../Transforms/IPO/GlobalMergeFunctions.h |  73 ++
 llvm/lib/CodeGen/TargetPassConfig.cpp |   3 +
 llvm/lib/LTO/LTO.cpp  |   1 +
 llvm/lib/Transforms/IPO/CMakeLists.txt|   2 +
 .../Transforms/IPO/GlobalMergeFunctions.cpp   | 669 ++
 .../ThinLTO/AArch64/cgdata-merge-local.ll |  62 ++
 .../test/ThinLTO/AArch64/cgdata-merge-read.ll |  82 +++
 .../AArch64/cgdata-merge-two-rounds.ll|  68 ++
 .../ThinLTO/AArch64/cgdata-merge-write.ll |  97 +++
 llvm/tools/llvm-lto2/CMakeLists.txt   |   1 +
 llvm/tools/llvm-lto2/llvm-lto2.cpp|   6 +
 16 files changed, 1080 insertions(+)
 create mode 100644 llvm/include/llvm/Transforms/IPO/GlobalMergeFunctions.h
 create mode 100644 llvm/lib/Transforms/IPO/GlobalMergeFunctions.cpp
 create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-local.ll
 create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-read.ll
 create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-two-rounds.ll
 create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-write.ll

diff --git a/llvm/include/llvm/CGData/CodeGenData.h 
b/llvm/include/llvm/CGData/CodeGenData.h
index 5d7c74725ccef1..da0e412f2a0e03 100644
--- a/llvm/include/llvm/CGData/CodeGenData.h
+++ b/llvm/include/llvm/CGData/CodeGenData.h
@@ -145,6 +145,9 @@ class CodeGenData {
   const OutlinedHashTree *getOutlinedHashTree() {
 return PublishedHashTree.get();
   }
+  const StableFunctionMap *getStableFunctionMap() {
+return PublishedStableFunctionMap.get();
+  }
 
   /// Returns true if we should write codegen data.
   bool emitCGData() { return EmitCGData; }
@@ -169,10 +172,18 @@ inline bool hasOutlinedHashTree() {
   return CodeGenData::getInstance().hasOutlinedHashTree();
 }
 
+inline bool hasStableFunctionMap() {
+  return CodeGenData::getInstance().hasStableFunctionMap();
+}
+
 inline const OutlinedHashTree *getOutlinedHashTree() {
   return CodeGenData::getInstance().getOutlinedHashTree();
 }
 
+inline const StableFunctionMap *getStableFunctionMap() {
+  return CodeGenData::getInstance().getStableFunctionMap();
+}
+
 inline bool emitCGData() { return CodeGenData::getInstance().emitCGData(); }
 
 inline void
diff --git a/llvm/include/llvm/InitializePasses.h 
b/llvm/include/llvm/InitializePasses.h
index 4352099d6dbb99..9aa36d5bb7f801 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -123,6 +123,7 @@ void initializeGCEmptyBasicBlocksPass(PassRegistry &);
 void initializeGCMachineCodeAnalysisPass(PassRegistry &);
 void initializeGCModuleInfoPass(PassRegistry &);
 void initializeGVNLegacyPassPass(PassRegistry &);
+void initializeGlobalMergeFuncPass(PassRegistry &);
 void initializeGlobalMergePass(PassRegistry &);
 void initializeGlobalsAAWrapperPassPass(PassRegistry &);
 void initializeHardwareLoopsLegacyPass(PassRegistry &);
diff --git a/llvm/include/llvm/LinkAllPasses.h 
b/llvm/include/llvm/LinkAllPasses.h
index 92b59a66567c95..ea3609a2b4bc71 100644
--- a/llvm/include/llvm/LinkAllPasses.h
+++ b/llvm/include/llvm/LinkAllPasses.h
@@ -79,6 +79,7 @@ struct ForcePassLinking {
 (void)llvm::createDomOnlyViewerWrapperPassPass();
 (void)llvm::createDomViewerWrapperPassPass();
 (void)llvm::createAlwaysInlinerLegacyPass();
+(void)llvm::createGlobalMergeFuncPass();
 (void)llvm::createGlobalsAAWrapperPass();
 (void)llvm::createInstSimplifyLegacyPass();
 (void)llvm::createInstructionCombiningPass();
diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h 
b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
index 13bc4700d87029..96b5b815132bc0 100644
--- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h
+++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
@@ -74,6 +74,7 @@
 #include "llvm/Target/CGPassBuilderOption.h"
 #include "llvm/Target/TargetMachine.h"
 #include "llvm/Transforms/CFGuard.h"
+#include "llvm/Transforms/IPO/GlobalMergeFunctions.h"
 #include "llvm/Transforms/Scalar/ConstantHoisting.h"
 #include "llvm/Transforms/Scalar/LoopPassManager.h"
 #include "llvm/Transforms/Scalar/LoopStrengthReduce.h"
diff --git a/llvm/include/llvm/Transforms/IPO.h 
b/llvm/include/llvm/Transforms/IPO.h
index ee0e35aa618325..86a8654f56997c 100644
--- a/llvm/include/llvm/Transforms/IPO.h
+++ b/llvm/include/llvm/Transforms/IPO.h
@@ -55,6 +55,8 @@ enum class PassSummaryAction {
   Export, ///< Export information to summary.
 };
 
+Pass *createGlobalMerg

[llvm-branch-commits] [lld] [CGData][lld-macho] Add Global Merge Func Pass (PR #112674)

2024-10-17 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com created 
https://github.com/llvm/llvm-project/pull/112674

None

>From 36978c1da750496941705b284b3c34495b6f7386 Mon Sep 17 00:00:00 2001
From: Kyungwoo Lee 
Date: Wed, 16 Oct 2024 22:56:38 -0700
Subject: [PATCH] [CGData][lld-macho] Add Global Merge Func Pass

---
 lld/MachO/CMakeLists.txt   |  2 +
 lld/MachO/Driver.cpp   | 18 +-
 lld/MachO/InputSection.h   |  1 +
 lld/MachO/LTO.cpp  |  7 +++
 lld/test/MachO/cgdata-generate-merge.s | 85 ++
 5 files changed, 112 insertions(+), 1 deletion(-)
 create mode 100644 lld/test/MachO/cgdata-generate-merge.s

diff --git a/lld/MachO/CMakeLists.txt b/lld/MachO/CMakeLists.txt
index ecf6ce609e59f2..137fe4939b4457 100644
--- a/lld/MachO/CMakeLists.txt
+++ b/lld/MachO/CMakeLists.txt
@@ -41,9 +41,11 @@ add_lld_library(lldMachO
   BitReader
   BitWriter
   CGData
+  CodeGen
   Core
   DebugInfoDWARF
   Demangle
+  IPO
   LTO
   MC
   ObjCARCOpts
diff --git a/lld/MachO/Driver.cpp b/lld/MachO/Driver.cpp
index ab4abb1fa97efc..59c24a06a2cb20 100644
--- a/lld/MachO/Driver.cpp
+++ b/lld/MachO/Driver.cpp
@@ -1326,7 +1326,8 @@ static void codegenDataGenerate() {
   TimeTraceScope timeScope("Generating codegen data");
 
   OutlinedHashTreeRecord globalOutlineRecord;
-  for (ConcatInputSection *isec : inputSections)
+  StableFunctionMapRecord globalMergeRecord;
+  for (ConcatInputSection *isec : inputSections) {
 if (isec->getSegName() == segment_names::data &&
 isec->getName() == section_names::outlinedHashTree) {
   // Read outlined hash tree from each section.
@@ -1337,10 +1338,25 @@ static void codegenDataGenerate() {
   // Merge it to the global hash tree.
   globalOutlineRecord.merge(localOutlineRecord);
 }
+if (isec->getSegName() == segment_names::data &&
+isec->getName() == section_names::functionmap) {
+  // Read stable functions from each section.
+  StableFunctionMapRecord localMergeRecord;
+  auto *data = isec->data.data();
+  localMergeRecord.deserialize(data);
+
+  // Merge it to the global function map.
+  globalMergeRecord.merge(localMergeRecord);
+}
+  }
+
+  globalMergeRecord.finalize();
 
   CodeGenDataWriter Writer;
   if (!globalOutlineRecord.empty())
 Writer.addRecord(globalOutlineRecord);
+  if (!globalMergeRecord.empty())
+Writer.addRecord(globalMergeRecord);
 
   std::error_code EC;
   auto fileName = config->codegenDataGeneratePath;
diff --git a/lld/MachO/InputSection.h b/lld/MachO/InputSection.h
index 7ef0e31066f372..b86520d36cda5b 100644
--- a/lld/MachO/InputSection.h
+++ b/lld/MachO/InputSection.h
@@ -339,6 +339,7 @@ constexpr const char const_[] = "__const";
 constexpr const char lazySymbolPtr[] = "__la_symbol_ptr";
 constexpr const char lazyBinding[] = "__lazy_binding";
 constexpr const char literals[] = "__literals";
+constexpr const char functionmap[] = "__llvm_merge";
 constexpr const char moduleInitFunc[] = "__mod_init_func";
 constexpr const char moduleTermFunc[] = "__mod_term_func";
 constexpr const char nonLazySymbolPtr[] = "__nl_symbol_ptr";
diff --git a/lld/MachO/LTO.cpp b/lld/MachO/LTO.cpp
index 28f5290edb58e3..9bddf9a6445f6d 100644
--- a/lld/MachO/LTO.cpp
+++ b/lld/MachO/LTO.cpp
@@ -25,6 +25,7 @@
 #include "llvm/Support/FileSystem.h"
 #include "llvm/Support/Path.h"
 #include "llvm/Support/raw_ostream.h"
+#include "llvm/Transforms/IPO.h"
 #include "llvm/Transforms/ObjCARC.h"
 
 using namespace lld;
@@ -38,6 +39,8 @@ static std::string getThinLTOOutputFile(StringRef modulePath) 
{
config->thinLTOPrefixReplaceNew);
 }
 
+extern cl::opt EnableGlobalMergeFunc;
+
 static lto::Config createConfig() {
   lto::Config c;
   c.Options = initTargetOptionsFromCodeGenFlags();
@@ -49,6 +52,10 @@ static lto::Config createConfig() {
   c.MAttrs = getMAttrs();
   c.DiagHandler = diagnosticHandler;
 
+  c.PreCodeGenPassesHook = [](legacy::PassManager &pm) {
+if (EnableGlobalMergeFunc)
+  pm.add(createGlobalMergeFuncPass());
+  };
   c.AlwaysEmitRegularLTOObj = !config->ltoObjPath.empty();
 
   c.TimeTraceEnabled = config->timeTraceEnabled;
diff --git a/lld/test/MachO/cgdata-generate-merge.s 
b/lld/test/MachO/cgdata-generate-merge.s
new file mode 100644
index 00..3f7fb6777bc3cf
--- /dev/null
+++ b/lld/test/MachO/cgdata-generate-merge.s
@@ -0,0 +1,85 @@
+# UNSUPPORTED: system-windows
+# REQUIRES: aarch64
+
+# RUN: rm -rf %t; split-file %s %t
+
+# Synthesize raw cgdata without the header (32 byte) from the indexed cgdata.
+# RUN: llvm-cgdata --convert --format binary %t/raw-1.cgtext -o %t/raw-1.cgdata
+# RUN: od -t x1 -j 32 -An %t/raw-1.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ 
/g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-1-bytes.txt
+# RUN: sed "s//$(cat %t/raw-1-bytes.txt)/g" %t/merge-template.s > 
%t/merge-1.s
+# RUN: llvm-cgdata --convert --format binary %t/raw-2.cgtext -o %t/raw-2.cgdata
+# RUN: od -t 

[llvm-branch-commits] [llvm] [CGData] Stable Function Map (PR #112662)

2024-10-16 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com created 
https://github.com/llvm/llvm-project/pull/112662

These define the main data structures to represent stable functions and group 
similar functions in a function map.
Serialization is supported in a binary or yaml form.

>From e7272c3a0293a0b2972e893335d652cc1ea27ebc Mon Sep 17 00:00:00 2001
From: Kyungwoo Lee 
Date: Sat, 7 Sep 2024 22:48:17 -0700
Subject: [PATCH] [CGData] Stable Function Map

These define the main data structures to represent stable functions and group
similar functions in a function map.
Serialization is supported in a binary or yaml form.
---
 llvm/include/llvm/CGData/StableFunctionMap.h  | 139 
 .../llvm/CGData/StableFunctionMapRecord.h |  64 ++
 llvm/lib/CGData/CMakeLists.txt|   2 +
 llvm/lib/CGData/StableFunctionMap.cpp | 167 +++
 llvm/lib/CGData/StableFunctionMapRecord.cpp   | 197 ++
 llvm/unittests/CGData/CMakeLists.txt  |   2 +
 .../CGData/StableFunctionMapRecordTest.cpp| 131 
 .../CGData/StableFunctionMapTest.cpp  | 146 +
 8 files changed, 848 insertions(+)
 create mode 100644 llvm/include/llvm/CGData/StableFunctionMap.h
 create mode 100644 llvm/include/llvm/CGData/StableFunctionMapRecord.h
 create mode 100644 llvm/lib/CGData/StableFunctionMap.cpp
 create mode 100644 llvm/lib/CGData/StableFunctionMapRecord.cpp
 create mode 100644 llvm/unittests/CGData/StableFunctionMapRecordTest.cpp
 create mode 100644 llvm/unittests/CGData/StableFunctionMapTest.cpp

diff --git a/llvm/include/llvm/CGData/StableFunctionMap.h 
b/llvm/include/llvm/CGData/StableFunctionMap.h
new file mode 100644
index 00..1dbc4257af1340
--- /dev/null
+++ b/llvm/include/llvm/CGData/StableFunctionMap.h
@@ -0,0 +1,139 @@
+//===- StableFunctionMap.h -*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===-===//
+//
+// TODO
+//
+//===-===//
+
+#ifndef LLVM_CGDATA_STABLEFUNCTIONMAP_H
+#define LLVM_CGDATA_STABLEFUNCTIONMAP_H
+
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/StableHashing.h"
+#include "llvm/ADT/StringMap.h"
+#include "llvm/IR/StructuralHash.h"
+#include "llvm/ObjectYAML/YAML.h"
+#include "llvm/Support/raw_ostream.h"
+
+#include 
+#include 
+
+namespace llvm {
+
+using IndexPairHash = std::pair;
+using IndexOperandHashVecType = SmallVector;
+
+/// A stable function is a function with a stable hash while tracking the
+/// locations of ignored operands and their hashes.
+struct StableFunction {
+  /// The combined stable hash of the function.
+  stable_hash Hash;
+  /// The name of the function.
+  std::string FunctionName;
+  /// The name of the module the function is in.
+  std::string ModuleName;
+  /// The number of instructions.
+  unsigned InstCount;
+  /// A vector of pairs of IndexPair and operand hash which was skipped.
+  IndexOperandHashVecType IndexOperandHashes;
+
+  StableFunction(stable_hash Hash, const std::string FunctionName,
+ const std::string ModuleName, unsigned InstCount,
+ IndexOperandHashVecType &&IndexOperandHashes)
+  : Hash(Hash), FunctionName(FunctionName), ModuleName(ModuleName),
+InstCount(InstCount),
+IndexOperandHashes(std::move(IndexOperandHashes)) {}
+  StableFunction() = default;
+};
+
+/// An efficient form of StableFunction for fast look-up
+struct StableFunctionEntry {
+  /// The combined stable hash of the function.
+  stable_hash Hash;
+  /// Id of the function name.
+  unsigned FunctionNameId;
+  /// Id of the module name.
+  unsigned ModuleNameId;
+  /// The number of instructions.
+  unsigned InstCount;
+  /// A map from an IndexPair to a stable_hash which was skipped.
+  std::unique_ptr IndexOperandHashMap;
+
+  StableFunctionEntry(
+  stable_hash Hash, unsigned FunctionNameId, unsigned ModuleNameId,
+  unsigned InstCount,
+  std::unique_ptr IndexOperandHashMap)
+  : Hash(Hash), FunctionNameId(FunctionNameId), ModuleNameId(ModuleNameId),
+InstCount(InstCount),
+IndexOperandHashMap(std::move(IndexOperandHashMap)) {}
+};
+
+using HashFuncsMapType =
+DenseMap>>;
+
+class StableFunctionMap {
+  /// A map from a stable_hash to a vector of functions with that hash.
+  HashFuncsMapType HashToFuncs;
+  /// A vector of strings to hold names.
+  SmallVector IdToName;
+  /// A map from StringRef (name) to an ID.
+  StringMap NameToId;
+  /// True if the function map is finalized with minimal content.
+  bool Finalized = false;
+
+public:
+  /// Get the HashToFuncs map for serialization.
+  const HashFuncsMapType &getFunctionMap() const { return HashToFuncs; }
+
+  /// Get the N

[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)

2024-10-16 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com created 
https://github.com/llvm/llvm-project/pull/112638

This comutes a structural hash while allowing for selective ignoring of certain 
operands based on a custom function that is provided. Instead of a single hash 
value, it now returns FunctionHashInfo which includes a hash value, an 
instruction mapping, and a map to track the operand location and its 
corresponding hash value that is ignored.

>From 6225d74229d41068c57109a24b063f6fcba13985 Mon Sep 17 00:00:00 2001
From: Kyungwoo Lee 
Date: Wed, 16 Oct 2024 17:09:07 -0700
Subject: [PATCH] [StructuralHash] Support Differences

This comutes a structural hash while allowing for selective ignoring of
certain operands based on a custom function that is provided.
Instead of a single hash value, it now returns FunctionHashInfo which
includes a hash value, an instruction mapping, and a map to track the
operand location and its corresponding hash value that is ignored.
---
 llvm/include/llvm/IR/StructuralHash.h|  46 ++
 llvm/lib/IR/StructuralHash.cpp   | 188 +--
 llvm/unittests/IR/StructuralHashTest.cpp |  55 +++
 3 files changed, 275 insertions(+), 14 deletions(-)

diff --git a/llvm/include/llvm/IR/StructuralHash.h 
b/llvm/include/llvm/IR/StructuralHash.h
index aa292bc3446799..bc82c204c4d1f6 100644
--- a/llvm/include/llvm/IR/StructuralHash.h
+++ b/llvm/include/llvm/IR/StructuralHash.h
@@ -14,7 +14,9 @@
 #ifndef LLVM_IR_STRUCTURALHASH_H
 #define LLVM_IR_STRUCTURALHASH_H
 
+#include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/StableHashing.h"
+#include "llvm/IR/Instruction.h"
 #include 
 
 namespace llvm {
@@ -23,6 +25,7 @@ class Function;
 class Module;
 
 using IRHash = stable_hash;
+using OpndHash = stable_hash;
 
 /// Returns a hash of the function \p F.
 /// \param F The function to hash.
@@ -37,6 +40,49 @@ IRHash StructuralHash(const Function &F, bool DetailedHash = 
false);
 /// composed the module hash.
 IRHash StructuralHash(const Module &M, bool DetailedHash = false);
 
+/// The pair of an instruction index and a operand index.
+using IndexPair = std::pair;
+
+/// A map from an instruction index to an instruction pointer.
+using IndexInstrMap = MapVector;
+
+/// A map from an IndexPair to an OpndHash.
+using IndexOperandHashMapType = DenseMap;
+
+/// A function that takes an instruction and an operand index and returns true
+/// if the operand should be ignored in the function hash computation.
+using IgnoreOperandFunc = std::function;
+
+struct FunctionHashInfo {
+  /// A hash value representing the structural content of the function
+  IRHash FunctionHash;
+  /// A mapping from instruction indices to instruction pointers
+  std::unique_ptr IndexInstruction;
+  /// A mapping from pairs of instruction indices and operand indices
+  /// to the hashes of the operands. This can be used to analyze or
+  /// reconstruct the differences in ignored operands
+  std::unique_ptr IndexOperandHashMap;
+
+  FunctionHashInfo(IRHash FuntionHash,
+   std::unique_ptr IndexInstruction,
+   std::unique_ptr 
IndexOperandHashMap)
+  : FunctionHash(FuntionHash),
+IndexInstruction(std::move(IndexInstruction)),
+IndexOperandHashMap(std::move(IndexOperandHashMap)) {}
+};
+
+/// Computes a structural hash of a given function, considering the structure
+/// and content of the function's instructions while allowing for selective
+/// ignoring of certain operands based on custom criteria. This hash can be 
used
+/// to identify functions that are structurally similar or identical, which is
+/// useful in optimizations, deduplication, or analysis tasks.
+/// \param F The function to hash.
+/// \param IgnoreOp A callable that takes an instruction and an operand index,
+/// and returns true if the operand should be ignored in the hash computation.
+/// \return A FunctionHashInfo structure
+FunctionHashInfo StructuralHashWithDifferences(const Function &F,
+   IgnoreOperandFunc IgnoreOp);
+
 } // end namespace llvm
 
 #endif
diff --git a/llvm/lib/IR/StructuralHash.cpp b/llvm/lib/IR/StructuralHash.cpp
index a1fabab77d52b2..6e0af666010a05 100644
--- a/llvm/lib/IR/StructuralHash.cpp
+++ b/llvm/lib/IR/StructuralHash.cpp
@@ -28,6 +28,19 @@ class StructuralHashImpl {
 
   bool DetailedHash;
 
+  /// IgnoreOp is a function that returns true if the operand should be 
ignored.
+  IgnoreOperandFunc IgnoreOp = nullptr;
+  /// A mapping from instruction indices to instruction pointers.
+  /// The index represents the position of an instruction based on the order in
+  /// which it is first encountered.
+  std::unique_ptr IndexInstruction = nullptr;
+  /// A mapping from pairs of instruction indices and operand indices
+  /// to the hashes of the operands.
+  std::unique_ptr IndexOperandHashMap = nullptr;
+
+  /// Assign a unique ID to each Value in the order they are first seen.
+  DenseMap ValueToId;
+
   // This will produce diff

[llvm-branch-commits] [llvm] [CGData][llvm-cgdata] Support for stable function map (PR #112664)

2024-10-16 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com created 
https://github.com/llvm/llvm-project/pull/112664

This introduces a new cgdata format for stable function maps. The raw data is 
embedded in the __llvm_merge section during compile time. This data can be read 
and merged using the llvm-cgdata tool, into an indexed cgdata file. 
Consequently, the tool is now capable of handling either outlined hash trees, 
stable function maps, or both, as they are orthogonal.

>From af5931f2a7aa020afed0ad474b6e6a7e4c564703 Mon Sep 17 00:00:00 2001
From: Kyungwoo Lee 
Date: Mon, 9 Sep 2024 19:38:05 -0700
Subject: [PATCH] [CGData][llvm-cgdata] Support for stable function map

This introduces a new cgdata format for stable function maps.
The raw data is embedded in the __llvm_merge section during compile time.
This data can be read and merged using the llvm-cgdata tool, into an indexed 
cgdata file. Consequently, the tool is now capable of handling either outlined 
hash trees, stable function maps, or both, as they are orthogonal.
---
 llvm/docs/CommandGuide/llvm-cgdata.rst| 16 ++--
 llvm/include/llvm/CGData/CodeGenData.h| 24 +-
 llvm/include/llvm/CGData/CodeGenData.inc  | 12 ++-
 llvm/include/llvm/CGData/CodeGenDataReader.h  | 29 ++-
 llvm/include/llvm/CGData/CodeGenDataWriter.h  | 17 +++-
 llvm/lib/CGData/CodeGenData.cpp   | 30 ---
 llvm/lib/CGData/CodeGenDataReader.cpp | 63 +-
 llvm/lib/CGData/CodeGenDataWriter.cpp | 30 ++-
 llvm/test/tools/llvm-cgdata/empty.test|  8 +-
 llvm/test/tools/llvm-cgdata/error.test| 13 +--
 .../merge-combined-funcmap-hashtree.test  | 66 +++
 .../llvm-cgdata/merge-funcmap-archive.test| 83 +++
 .../llvm-cgdata/merge-funcmap-concat.test | 78 +
 .../llvm-cgdata/merge-funcmap-double.test | 79 ++
 .../llvm-cgdata/merge-funcmap-single.test | 36 
 ...chive.test => merge-hashtree-archive.test} |  8 +-
 ...concat.test => merge-hashtree-concat.test} |  6 +-
 ...double.test => merge-hashtree-double.test} |  8 +-
 ...single.test => merge-hashtree-single.test} |  4 +-
 llvm/tools/llvm-cgdata/llvm-cgdata.cpp| 46 +++---
 20 files changed, 572 insertions(+), 84 deletions(-)
 create mode 100644 
llvm/test/tools/llvm-cgdata/merge-combined-funcmap-hashtree.test
 create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-archive.test
 create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-concat.test
 create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-double.test
 create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-single.test
 rename llvm/test/tools/llvm-cgdata/{merge-archive.test => 
merge-hashtree-archive.test} (91%)
 rename llvm/test/tools/llvm-cgdata/{merge-concat.test => 
merge-hashtree-concat.test} (93%)
 rename llvm/test/tools/llvm-cgdata/{merge-double.test => 
merge-hashtree-double.test} (90%)
 rename llvm/test/tools/llvm-cgdata/{merge-single.test => 
merge-hashtree-single.test} (92%)

diff --git a/llvm/docs/CommandGuide/llvm-cgdata.rst 
b/llvm/docs/CommandGuide/llvm-cgdata.rst
index f592e1508844ee..0670decd087e39 100644
--- a/llvm/docs/CommandGuide/llvm-cgdata.rst
+++ b/llvm/docs/CommandGuide/llvm-cgdata.rst
@@ -11,15 +11,13 @@ SYNOPSIS
 DESCRIPTION
 ---
 
-The :program:llvm-cgdata utility parses raw codegen data embedded
-in compiled binary files and merges them into a single .cgdata file.
-It can also inspect and manipulate .cgdata files.
-Currently, the tool supports saving and restoring outlined hash trees,
-enabling global function outlining across modules, allowing for more
-efficient function outlining in subsequent compilations.
-The design is extensible, allowing for the incorporation of additional
-codegen summaries and optimization techniques, such as global function
-merging, in the future.
+The :program:llvm-cgdata utility parses raw codegen data embedded in compiled
+binary files and merges them into a single .cgdata file. It can also inspect
+and manipulate .cgdata files. Currently, the tool supports saving and restoring
+outlined hash trees and stable function maps, allowing for more efficient
+function outlining and function merging across modules in subsequent
+compilations. The design is extensible, allowing for the incorporation of
+additional codegen summaries and optimization techniques.
 
 COMMANDS
 
diff --git a/llvm/include/llvm/CGData/CodeGenData.h 
b/llvm/include/llvm/CGData/CodeGenData.h
index 53550beeae1f83..5d7c74725ccef1 100644
--- a/llvm/include/llvm/CGData/CodeGenData.h
+++ b/llvm/include/llvm/CGData/CodeGenData.h
@@ -19,6 +19,7 @@
 #include "llvm/Bitcode/BitcodeReader.h"
 #include "llvm/CGData/OutlinedHashTree.h"
 #include "llvm/CGData/OutlinedHashTreeRecord.h"
+#include "llvm/CGData/StableFunctionMapRecord.h"
 #include "llvm/IR/Module.h"
 #include "llvm/Object/ObjectFile.h"
 #include "llvm/Support/Caching.h"
@@ -41,7 +42,9 @@ enum class CGDataKind {
  

[llvm-branch-commits] [llvm] [CGData] Global Merge Functions (PR #112671)

2024-10-17 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com updated 
https://github.com/llvm/llvm-project/pull/112671

>From 1601086634428b95d1a195e5ecb8f5b9d1f1709c Mon Sep 17 00:00:00 2001
From: Kyungwoo Lee 
Date: Fri, 30 Aug 2024 00:09:09 -0700
Subject: [PATCH] [CGData] Global Merge Functions

---
 llvm/include/llvm/CGData/CodeGenData.h|  11 +
 llvm/include/llvm/InitializePasses.h  |   1 +
 llvm/include/llvm/LinkAllPasses.h |   1 +
 llvm/include/llvm/Passes/CodeGenPassBuilder.h |   1 +
 llvm/include/llvm/Transforms/IPO.h|   2 +
 .../Transforms/IPO/GlobalMergeFunctions.h |  77 ++
 llvm/lib/CodeGen/TargetPassConfig.cpp |   3 +
 llvm/lib/LTO/LTO.cpp  |   1 +
 llvm/lib/Transforms/IPO/CMakeLists.txt|   2 +
 .../Transforms/IPO/GlobalMergeFunctions.cpp   | 687 ++
 .../ThinLTO/AArch64/cgdata-merge-local.ll |  62 ++
 .../test/ThinLTO/AArch64/cgdata-merge-read.ll |  82 +++
 .../AArch64/cgdata-merge-two-rounds.ll|  68 ++
 .../ThinLTO/AArch64/cgdata-merge-write.ll |  97 +++
 llvm/tools/llvm-lto2/CMakeLists.txt   |   1 +
 llvm/tools/llvm-lto2/llvm-lto2.cpp|   6 +
 16 files changed, 1102 insertions(+)
 create mode 100644 llvm/include/llvm/Transforms/IPO/GlobalMergeFunctions.h
 create mode 100644 llvm/lib/Transforms/IPO/GlobalMergeFunctions.cpp
 create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-local.ll
 create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-read.ll
 create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-two-rounds.ll
 create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-write.ll

diff --git a/llvm/include/llvm/CGData/CodeGenData.h 
b/llvm/include/llvm/CGData/CodeGenData.h
index 5d7c74725ccef1..da0e412f2a0e03 100644
--- a/llvm/include/llvm/CGData/CodeGenData.h
+++ b/llvm/include/llvm/CGData/CodeGenData.h
@@ -145,6 +145,9 @@ class CodeGenData {
   const OutlinedHashTree *getOutlinedHashTree() {
 return PublishedHashTree.get();
   }
+  const StableFunctionMap *getStableFunctionMap() {
+return PublishedStableFunctionMap.get();
+  }
 
   /// Returns true if we should write codegen data.
   bool emitCGData() { return EmitCGData; }
@@ -169,10 +172,18 @@ inline bool hasOutlinedHashTree() {
   return CodeGenData::getInstance().hasOutlinedHashTree();
 }
 
+inline bool hasStableFunctionMap() {
+  return CodeGenData::getInstance().hasStableFunctionMap();
+}
+
 inline const OutlinedHashTree *getOutlinedHashTree() {
   return CodeGenData::getInstance().getOutlinedHashTree();
 }
 
+inline const StableFunctionMap *getStableFunctionMap() {
+  return CodeGenData::getInstance().getStableFunctionMap();
+}
+
 inline bool emitCGData() { return CodeGenData::getInstance().emitCGData(); }
 
 inline void
diff --git a/llvm/include/llvm/InitializePasses.h 
b/llvm/include/llvm/InitializePasses.h
index 4352099d6dbb99..9aa36d5bb7f801 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -123,6 +123,7 @@ void initializeGCEmptyBasicBlocksPass(PassRegistry &);
 void initializeGCMachineCodeAnalysisPass(PassRegistry &);
 void initializeGCModuleInfoPass(PassRegistry &);
 void initializeGVNLegacyPassPass(PassRegistry &);
+void initializeGlobalMergeFuncPass(PassRegistry &);
 void initializeGlobalMergePass(PassRegistry &);
 void initializeGlobalsAAWrapperPassPass(PassRegistry &);
 void initializeHardwareLoopsLegacyPass(PassRegistry &);
diff --git a/llvm/include/llvm/LinkAllPasses.h 
b/llvm/include/llvm/LinkAllPasses.h
index 92b59a66567c95..ea3609a2b4bc71 100644
--- a/llvm/include/llvm/LinkAllPasses.h
+++ b/llvm/include/llvm/LinkAllPasses.h
@@ -79,6 +79,7 @@ struct ForcePassLinking {
 (void)llvm::createDomOnlyViewerWrapperPassPass();
 (void)llvm::createDomViewerWrapperPassPass();
 (void)llvm::createAlwaysInlinerLegacyPass();
+(void)llvm::createGlobalMergeFuncPass();
 (void)llvm::createGlobalsAAWrapperPass();
 (void)llvm::createInstSimplifyLegacyPass();
 (void)llvm::createInstructionCombiningPass();
diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h 
b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
index 13bc4700d87029..96b5b815132bc0 100644
--- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h
+++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
@@ -74,6 +74,7 @@
 #include "llvm/Target/CGPassBuilderOption.h"
 #include "llvm/Target/TargetMachine.h"
 #include "llvm/Transforms/CFGuard.h"
+#include "llvm/Transforms/IPO/GlobalMergeFunctions.h"
 #include "llvm/Transforms/Scalar/ConstantHoisting.h"
 #include "llvm/Transforms/Scalar/LoopPassManager.h"
 #include "llvm/Transforms/Scalar/LoopStrengthReduce.h"
diff --git a/llvm/include/llvm/Transforms/IPO.h 
b/llvm/include/llvm/Transforms/IPO.h
index ee0e35aa618325..86a8654f56997c 100644
--- a/llvm/include/llvm/Transforms/IPO.h
+++ b/llvm/include/llvm/Transforms/IPO.h
@@ -55,6 +55,8 @@ enum class PassSummaryAction {
   Export, ///< Export information to summary.
 };
 
+Pass *createGlobalMergeFuncP

[llvm-branch-commits] [lld] [llvm] [CGData][llvm-cgdata] Support for stable function map (PR #112664)

2024-10-17 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com updated 
https://github.com/llvm/llvm-project/pull/112664

>From 09f1ec7730868a53cb566b0913e7952dfc15fa16 Mon Sep 17 00:00:00 2001
From: Kyungwoo Lee 
Date: Mon, 9 Sep 2024 19:38:05 -0700
Subject: [PATCH] [CGData][llvm-cgdata] Support for stable function map

This introduces a new cgdata format for stable function maps.
The raw data is embedded in the __llvm_merge section during compile time.
This data can be read and merged using the llvm-cgdata tool, into an indexed 
cgdata file. Consequently, the tool is now capable of handling either outlined 
hash trees, stable function maps, or both, as they are orthogonal.
---
 lld/test/MachO/cgdata-generate.s  |  6 +-
 llvm/docs/CommandGuide/llvm-cgdata.rst| 16 ++--
 llvm/include/llvm/CGData/CodeGenData.h| 24 +-
 llvm/include/llvm/CGData/CodeGenData.inc  | 12 ++-
 llvm/include/llvm/CGData/CodeGenDataReader.h  | 29 ++-
 llvm/include/llvm/CGData/CodeGenDataWriter.h  | 17 +++-
 llvm/lib/CGData/CodeGenData.cpp   | 30 ---
 llvm/lib/CGData/CodeGenDataReader.cpp | 63 +-
 llvm/lib/CGData/CodeGenDataWriter.cpp | 30 ++-
 llvm/test/tools/llvm-cgdata/empty.test|  8 +-
 llvm/test/tools/llvm-cgdata/error.test| 13 +--
 .../merge-combined-funcmap-hashtree.test  | 66 +++
 .../llvm-cgdata/merge-funcmap-archive.test| 83 +++
 .../llvm-cgdata/merge-funcmap-concat.test | 78 +
 .../llvm-cgdata/merge-funcmap-double.test | 79 ++
 .../llvm-cgdata/merge-funcmap-single.test | 36 
 ...chive.test => merge-hashtree-archive.test} |  8 +-
 ...concat.test => merge-hashtree-concat.test} |  6 +-
 ...double.test => merge-hashtree-double.test} |  8 +-
 ...single.test => merge-hashtree-single.test} |  4 +-
 llvm/tools/llvm-cgdata/llvm-cgdata.cpp| 48 ---
 21 files changed, 577 insertions(+), 87 deletions(-)
 create mode 100644 
llvm/test/tools/llvm-cgdata/merge-combined-funcmap-hashtree.test
 create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-archive.test
 create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-concat.test
 create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-double.test
 create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-single.test
 rename llvm/test/tools/llvm-cgdata/{merge-archive.test => 
merge-hashtree-archive.test} (91%)
 rename llvm/test/tools/llvm-cgdata/{merge-concat.test => 
merge-hashtree-concat.test} (93%)
 rename llvm/test/tools/llvm-cgdata/{merge-double.test => 
merge-hashtree-double.test} (90%)
 rename llvm/test/tools/llvm-cgdata/{merge-single.test => 
merge-hashtree-single.test} (92%)

diff --git a/lld/test/MachO/cgdata-generate.s b/lld/test/MachO/cgdata-generate.s
index 174df39d666c5d..f942ae07f64e0e 100644
--- a/lld/test/MachO/cgdata-generate.s
+++ b/lld/test/MachO/cgdata-generate.s
@@ -3,12 +3,12 @@
 
 # RUN: rm -rf %t; split-file %s %t
 
-# Synthesize raw cgdata without the header (24 byte) from the indexed cgdata.
+# Synthesize raw cgdata without the header (32 byte) from the indexed cgdata.
 # RUN: llvm-cgdata --convert --format binary %t/raw-1.cgtext -o %t/raw-1.cgdata
-# RUN: od -t x1 -j 24 -An %t/raw-1.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ 
/g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-1-bytes.txt
+# RUN: od -t x1 -j 32 -An %t/raw-1.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ 
/g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-1-bytes.txt
 # RUN: sed "s//$(cat %t/raw-1-bytes.txt)/g" %t/merge-template.s > 
%t/merge-1.s
 # RUN: llvm-cgdata --convert --format binary %t/raw-2.cgtext -o %t/raw-2.cgdata
-# RUN: od -t x1 -j 24 -An %t/raw-2.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ 
/g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-2-bytes.txt
+# RUN: od -t x1 -j 32 -An %t/raw-2.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ 
/g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-2-bytes.txt
 # RUN: sed "s//$(cat %t/raw-2-bytes.txt)/g" %t/merge-template.s > 
%t/merge-2.s
 
 # RUN: llvm-mc -filetype obj -triple arm64-apple-darwin %t/merge-1.s -o 
%t/merge-1.o
diff --git a/llvm/docs/CommandGuide/llvm-cgdata.rst 
b/llvm/docs/CommandGuide/llvm-cgdata.rst
index f592e1508844ee..0670decd087e39 100644
--- a/llvm/docs/CommandGuide/llvm-cgdata.rst
+++ b/llvm/docs/CommandGuide/llvm-cgdata.rst
@@ -11,15 +11,13 @@ SYNOPSIS
 DESCRIPTION
 ---
 
-The :program:llvm-cgdata utility parses raw codegen data embedded
-in compiled binary files and merges them into a single .cgdata file.
-It can also inspect and manipulate .cgdata files.
-Currently, the tool supports saving and restoring outlined hash trees,
-enabling global function outlining across modules, allowing for more
-efficient function outlining in subsequent compilations.
-The design is extensible, allowing for the incorporation of additional
-codegen summaries and optimization techniques, such as global function
-merging, in the futur

[llvm-branch-commits] [llvm] [CGData] Stable Function Map (PR #112662)

2024-10-17 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com updated 
https://github.com/llvm/llvm-project/pull/112662

>From 060a23e39a68729859bb7b74e38586b0356e2ba6 Mon Sep 17 00:00:00 2001
From: Kyungwoo Lee 
Date: Sat, 7 Sep 2024 22:48:17 -0700
Subject: [PATCH] [CGData] Stable Function Map

These define the main data structures to represent stable functions and group
similar functions in a function map.
Serialization is supported in a binary or yaml form.
---
 llvm/include/llvm/CGData/StableFunctionMap.h  | 139 
 .../llvm/CGData/StableFunctionMapRecord.h |  68 ++
 llvm/lib/CGData/CMakeLists.txt|   2 +
 llvm/lib/CGData/StableFunctionMap.cpp | 167 +++
 llvm/lib/CGData/StableFunctionMapRecord.cpp   | 202 ++
 llvm/unittests/CGData/CMakeLists.txt  |   2 +
 .../CGData/StableFunctionMapRecordTest.cpp| 131 
 .../CGData/StableFunctionMapTest.cpp  | 146 +
 8 files changed, 857 insertions(+)
 create mode 100644 llvm/include/llvm/CGData/StableFunctionMap.h
 create mode 100644 llvm/include/llvm/CGData/StableFunctionMapRecord.h
 create mode 100644 llvm/lib/CGData/StableFunctionMap.cpp
 create mode 100644 llvm/lib/CGData/StableFunctionMapRecord.cpp
 create mode 100644 llvm/unittests/CGData/StableFunctionMapRecordTest.cpp
 create mode 100644 llvm/unittests/CGData/StableFunctionMapTest.cpp

diff --git a/llvm/include/llvm/CGData/StableFunctionMap.h 
b/llvm/include/llvm/CGData/StableFunctionMap.h
new file mode 100644
index 00..ec205ef846f5c9
--- /dev/null
+++ b/llvm/include/llvm/CGData/StableFunctionMap.h
@@ -0,0 +1,139 @@
+//===- StableFunctionMap.h -*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===-===//
+//
+// TODO
+//
+//===-===//
+
+#ifndef LLVM_CGDATA_STABLEFUNCTIONMAP_H
+#define LLVM_CGDATA_STABLEFUNCTIONMAP_H
+
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/StableHashing.h"
+#include "llvm/ADT/StringMap.h"
+#include "llvm/IR/StructuralHash.h"
+#include "llvm/ObjectYAML/YAML.h"
+#include "llvm/Support/raw_ostream.h"
+
+#include 
+#include 
+
+namespace llvm {
+
+using IndexPairHash = std::pair;
+using IndexOperandHashVecType = SmallVector;
+
+/// A stable function is a function with a stable hash while tracking the
+/// locations of ignored operands and their hashes.
+struct StableFunction {
+  /// The combined stable hash of the function.
+  stable_hash Hash;
+  /// The name of the function.
+  std::string FunctionName;
+  /// The name of the module the function is in.
+  std::string ModuleName;
+  /// The number of instructions.
+  unsigned InstCount;
+  /// A vector of pairs of IndexPair and operand hash which was skipped.
+  IndexOperandHashVecType IndexOperandHashes;
+
+  StableFunction(stable_hash Hash, const std::string FunctionName,
+ const std::string ModuleName, unsigned InstCount,
+ IndexOperandHashVecType &&IndexOperandHashes)
+  : Hash(Hash), FunctionName(FunctionName), ModuleName(ModuleName),
+InstCount(InstCount),
+IndexOperandHashes(std::move(IndexOperandHashes)) {}
+  StableFunction() = default;
+};
+
+/// An efficient form of StableFunction for fast look-up
+struct StableFunctionEntry {
+  /// The combined stable hash of the function.
+  stable_hash Hash;
+  /// Id of the function name.
+  unsigned FunctionNameId;
+  /// Id of the module name.
+  unsigned ModuleNameId;
+  /// The number of instructions.
+  unsigned InstCount;
+  /// A map from an IndexPair to a stable_hash which was skipped.
+  std::unique_ptr IndexOperandHashMap;
+
+  StableFunctionEntry(
+  stable_hash Hash, unsigned FunctionNameId, unsigned ModuleNameId,
+  unsigned InstCount,
+  std::unique_ptr IndexOperandHashMap)
+  : Hash(Hash), FunctionNameId(FunctionNameId), ModuleNameId(ModuleNameId),
+InstCount(InstCount),
+IndexOperandHashMap(std::move(IndexOperandHashMap)) {}
+};
+
+using HashFuncsMapType =
+DenseMap>>;
+
+class StableFunctionMap {
+  /// A map from a stable_hash to a vector of functions with that hash.
+  HashFuncsMapType HashToFuncs;
+  /// A vector of strings to hold names.
+  SmallVector IdToName;
+  /// A map from StringRef (name) to an ID.
+  StringMap NameToId;
+  /// True if the function map is finalized with minimal content.
+  bool Finalized = false;
+
+public:
+  /// Get the HashToFuncs map for serialization.
+  const HashFuncsMapType &getFunctionMap() const { return HashToFuncs; }
+
+  /// Get the NameToId vector for serialization.
+  const SmallVector getNames() const { return IdToName; }
+
+  /// Get an existing ID associated with the given name or create a new ID

[llvm-branch-commits] [lld] [CGData][lld-macho] Add Global Merge Func Pass (PR #112674)

2024-10-17 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com updated 
https://github.com/llvm/llvm-project/pull/112674

>From 6b0b6194a02209036e032a8941f8e5817b402318 Mon Sep 17 00:00:00 2001
From: Kyungwoo Lee 
Date: Wed, 16 Oct 2024 22:56:38 -0700
Subject: [PATCH] [CGData][lld-macho] Add Global Merge Func Pass

---
 lld/MachO/CMakeLists.txt   |  2 +
 lld/MachO/Driver.cpp   | 18 +-
 lld/MachO/InputSection.h   |  1 +
 lld/MachO/LTO.cpp  |  7 +++
 lld/test/MachO/cgdata-generate-merge.s | 85 ++
 5 files changed, 112 insertions(+), 1 deletion(-)
 create mode 100644 lld/test/MachO/cgdata-generate-merge.s

diff --git a/lld/MachO/CMakeLists.txt b/lld/MachO/CMakeLists.txt
index ecf6ce609e59f2..137fe4939b4457 100644
--- a/lld/MachO/CMakeLists.txt
+++ b/lld/MachO/CMakeLists.txt
@@ -41,9 +41,11 @@ add_lld_library(lldMachO
   BitReader
   BitWriter
   CGData
+  CodeGen
   Core
   DebugInfoDWARF
   Demangle
+  IPO
   LTO
   MC
   ObjCARCOpts
diff --git a/lld/MachO/Driver.cpp b/lld/MachO/Driver.cpp
index ab4abb1fa97efc..59c24a06a2cb20 100644
--- a/lld/MachO/Driver.cpp
+++ b/lld/MachO/Driver.cpp
@@ -1326,7 +1326,8 @@ static void codegenDataGenerate() {
   TimeTraceScope timeScope("Generating codegen data");
 
   OutlinedHashTreeRecord globalOutlineRecord;
-  for (ConcatInputSection *isec : inputSections)
+  StableFunctionMapRecord globalMergeRecord;
+  for (ConcatInputSection *isec : inputSections) {
 if (isec->getSegName() == segment_names::data &&
 isec->getName() == section_names::outlinedHashTree) {
   // Read outlined hash tree from each section.
@@ -1337,10 +1338,25 @@ static void codegenDataGenerate() {
   // Merge it to the global hash tree.
   globalOutlineRecord.merge(localOutlineRecord);
 }
+if (isec->getSegName() == segment_names::data &&
+isec->getName() == section_names::functionmap) {
+  // Read stable functions from each section.
+  StableFunctionMapRecord localMergeRecord;
+  auto *data = isec->data.data();
+  localMergeRecord.deserialize(data);
+
+  // Merge it to the global function map.
+  globalMergeRecord.merge(localMergeRecord);
+}
+  }
+
+  globalMergeRecord.finalize();
 
   CodeGenDataWriter Writer;
   if (!globalOutlineRecord.empty())
 Writer.addRecord(globalOutlineRecord);
+  if (!globalMergeRecord.empty())
+Writer.addRecord(globalMergeRecord);
 
   std::error_code EC;
   auto fileName = config->codegenDataGeneratePath;
diff --git a/lld/MachO/InputSection.h b/lld/MachO/InputSection.h
index 7ef0e31066f372..b86520d36cda5b 100644
--- a/lld/MachO/InputSection.h
+++ b/lld/MachO/InputSection.h
@@ -339,6 +339,7 @@ constexpr const char const_[] = "__const";
 constexpr const char lazySymbolPtr[] = "__la_symbol_ptr";
 constexpr const char lazyBinding[] = "__lazy_binding";
 constexpr const char literals[] = "__literals";
+constexpr const char functionmap[] = "__llvm_merge";
 constexpr const char moduleInitFunc[] = "__mod_init_func";
 constexpr const char moduleTermFunc[] = "__mod_term_func";
 constexpr const char nonLazySymbolPtr[] = "__nl_symbol_ptr";
diff --git a/lld/MachO/LTO.cpp b/lld/MachO/LTO.cpp
index 28f5290edb58e3..9bddf9a6445f6d 100644
--- a/lld/MachO/LTO.cpp
+++ b/lld/MachO/LTO.cpp
@@ -25,6 +25,7 @@
 #include "llvm/Support/FileSystem.h"
 #include "llvm/Support/Path.h"
 #include "llvm/Support/raw_ostream.h"
+#include "llvm/Transforms/IPO.h"
 #include "llvm/Transforms/ObjCARC.h"
 
 using namespace lld;
@@ -38,6 +39,8 @@ static std::string getThinLTOOutputFile(StringRef modulePath) 
{
config->thinLTOPrefixReplaceNew);
 }
 
+extern cl::opt EnableGlobalMergeFunc;
+
 static lto::Config createConfig() {
   lto::Config c;
   c.Options = initTargetOptionsFromCodeGenFlags();
@@ -49,6 +52,10 @@ static lto::Config createConfig() {
   c.MAttrs = getMAttrs();
   c.DiagHandler = diagnosticHandler;
 
+  c.PreCodeGenPassesHook = [](legacy::PassManager &pm) {
+if (EnableGlobalMergeFunc)
+  pm.add(createGlobalMergeFuncPass());
+  };
   c.AlwaysEmitRegularLTOObj = !config->ltoObjPath.empty();
 
   c.TimeTraceEnabled = config->timeTraceEnabled;
diff --git a/lld/test/MachO/cgdata-generate-merge.s 
b/lld/test/MachO/cgdata-generate-merge.s
new file mode 100644
index 00..3f7fb6777bc3cf
--- /dev/null
+++ b/lld/test/MachO/cgdata-generate-merge.s
@@ -0,0 +1,85 @@
+# UNSUPPORTED: system-windows
+# REQUIRES: aarch64
+
+# RUN: rm -rf %t; split-file %s %t
+
+# Synthesize raw cgdata without the header (32 byte) from the indexed cgdata.
+# RUN: llvm-cgdata --convert --format binary %t/raw-1.cgtext -o %t/raw-1.cgdata
+# RUN: od -t x1 -j 32 -An %t/raw-1.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ 
/g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-1-bytes.txt
+# RUN: sed "s//$(cat %t/raw-1-bytes.txt)/g" %t/merge-template.s > 
%t/merge-1.s
+# RUN: llvm-cgdata --convert --format binary %t/raw-2.cgtext -o %t/raw-2.cgdata
+# RUN: od -t x1 -j 

[llvm-branch-commits] [llvm] [CGData] Global Merge Functions (PR #112671)

2024-10-17 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com updated 
https://github.com/llvm/llvm-project/pull/112671

>From ded5771bb4ff7c8fd5401b4efe0af988539a8162 Mon Sep 17 00:00:00 2001
From: Kyungwoo Lee 
Date: Fri, 30 Aug 2024 00:09:09 -0700
Subject: [PATCH] [CGData] Global Merge Functions

---
 llvm/include/llvm/CGData/CodeGenData.h|  11 +
 llvm/include/llvm/InitializePasses.h  |   1 +
 llvm/include/llvm/LinkAllPasses.h |   1 +
 llvm/include/llvm/Passes/CodeGenPassBuilder.h |   1 +
 llvm/include/llvm/Transforms/IPO.h|   2 +
 .../Transforms/IPO/GlobalMergeFunctions.h |  77 ++
 llvm/lib/CodeGen/TargetPassConfig.cpp |   3 +
 llvm/lib/LTO/LTO.cpp  |   1 +
 llvm/lib/Transforms/IPO/CMakeLists.txt|   2 +
 .../Transforms/IPO/GlobalMergeFunctions.cpp   | 687 ++
 .../ThinLTO/AArch64/cgdata-merge-local.ll |  62 ++
 .../test/ThinLTO/AArch64/cgdata-merge-read.ll |  82 +++
 .../AArch64/cgdata-merge-two-rounds.ll|  68 ++
 .../ThinLTO/AArch64/cgdata-merge-write.ll |  97 +++
 llvm/tools/llvm-lto2/CMakeLists.txt   |   1 +
 llvm/tools/llvm-lto2/llvm-lto2.cpp|   6 +
 16 files changed, 1102 insertions(+)
 create mode 100644 llvm/include/llvm/Transforms/IPO/GlobalMergeFunctions.h
 create mode 100644 llvm/lib/Transforms/IPO/GlobalMergeFunctions.cpp
 create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-local.ll
 create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-read.ll
 create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-two-rounds.ll
 create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-write.ll

diff --git a/llvm/include/llvm/CGData/CodeGenData.h 
b/llvm/include/llvm/CGData/CodeGenData.h
index 5d7c74725ccef1..da0e412f2a0e03 100644
--- a/llvm/include/llvm/CGData/CodeGenData.h
+++ b/llvm/include/llvm/CGData/CodeGenData.h
@@ -145,6 +145,9 @@ class CodeGenData {
   const OutlinedHashTree *getOutlinedHashTree() {
 return PublishedHashTree.get();
   }
+  const StableFunctionMap *getStableFunctionMap() {
+return PublishedStableFunctionMap.get();
+  }
 
   /// Returns true if we should write codegen data.
   bool emitCGData() { return EmitCGData; }
@@ -169,10 +172,18 @@ inline bool hasOutlinedHashTree() {
   return CodeGenData::getInstance().hasOutlinedHashTree();
 }
 
+inline bool hasStableFunctionMap() {
+  return CodeGenData::getInstance().hasStableFunctionMap();
+}
+
 inline const OutlinedHashTree *getOutlinedHashTree() {
   return CodeGenData::getInstance().getOutlinedHashTree();
 }
 
+inline const StableFunctionMap *getStableFunctionMap() {
+  return CodeGenData::getInstance().getStableFunctionMap();
+}
+
 inline bool emitCGData() { return CodeGenData::getInstance().emitCGData(); }
 
 inline void
diff --git a/llvm/include/llvm/InitializePasses.h 
b/llvm/include/llvm/InitializePasses.h
index 4352099d6dbb99..9aa36d5bb7f801 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -123,6 +123,7 @@ void initializeGCEmptyBasicBlocksPass(PassRegistry &);
 void initializeGCMachineCodeAnalysisPass(PassRegistry &);
 void initializeGCModuleInfoPass(PassRegistry &);
 void initializeGVNLegacyPassPass(PassRegistry &);
+void initializeGlobalMergeFuncPass(PassRegistry &);
 void initializeGlobalMergePass(PassRegistry &);
 void initializeGlobalsAAWrapperPassPass(PassRegistry &);
 void initializeHardwareLoopsLegacyPass(PassRegistry &);
diff --git a/llvm/include/llvm/LinkAllPasses.h 
b/llvm/include/llvm/LinkAllPasses.h
index 92b59a66567c95..ea3609a2b4bc71 100644
--- a/llvm/include/llvm/LinkAllPasses.h
+++ b/llvm/include/llvm/LinkAllPasses.h
@@ -79,6 +79,7 @@ struct ForcePassLinking {
 (void)llvm::createDomOnlyViewerWrapperPassPass();
 (void)llvm::createDomViewerWrapperPassPass();
 (void)llvm::createAlwaysInlinerLegacyPass();
+(void)llvm::createGlobalMergeFuncPass();
 (void)llvm::createGlobalsAAWrapperPass();
 (void)llvm::createInstSimplifyLegacyPass();
 (void)llvm::createInstructionCombiningPass();
diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h 
b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
index 13bc4700d87029..96b5b815132bc0 100644
--- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h
+++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
@@ -74,6 +74,7 @@
 #include "llvm/Target/CGPassBuilderOption.h"
 #include "llvm/Target/TargetMachine.h"
 #include "llvm/Transforms/CFGuard.h"
+#include "llvm/Transforms/IPO/GlobalMergeFunctions.h"
 #include "llvm/Transforms/Scalar/ConstantHoisting.h"
 #include "llvm/Transforms/Scalar/LoopPassManager.h"
 #include "llvm/Transforms/Scalar/LoopStrengthReduce.h"
diff --git a/llvm/include/llvm/Transforms/IPO.h 
b/llvm/include/llvm/Transforms/IPO.h
index ee0e35aa618325..86a8654f56997c 100644
--- a/llvm/include/llvm/Transforms/IPO.h
+++ b/llvm/include/llvm/Transforms/IPO.h
@@ -55,6 +55,8 @@ enum class PassSummaryAction {
   Export, ///< Export information to summary.
 };
 
+Pass *createGlobalMergeFuncP

[llvm-branch-commits] [lld] [llvm] [CGData][llvm-cgdata] Support for stable function map (PR #112664)

2024-10-17 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com updated 
https://github.com/llvm/llvm-project/pull/112664

>From 09f1ec7730868a53cb566b0913e7952dfc15fa16 Mon Sep 17 00:00:00 2001
From: Kyungwoo Lee 
Date: Mon, 9 Sep 2024 19:38:05 -0700
Subject: [PATCH] [CGData][llvm-cgdata] Support for stable function map

This introduces a new cgdata format for stable function maps.
The raw data is embedded in the __llvm_merge section during compile time.
This data can be read and merged using the llvm-cgdata tool, into an indexed 
cgdata file. Consequently, the tool is now capable of handling either outlined 
hash trees, stable function maps, or both, as they are orthogonal.
---
 lld/test/MachO/cgdata-generate.s  |  6 +-
 llvm/docs/CommandGuide/llvm-cgdata.rst| 16 ++--
 llvm/include/llvm/CGData/CodeGenData.h| 24 +-
 llvm/include/llvm/CGData/CodeGenData.inc  | 12 ++-
 llvm/include/llvm/CGData/CodeGenDataReader.h  | 29 ++-
 llvm/include/llvm/CGData/CodeGenDataWriter.h  | 17 +++-
 llvm/lib/CGData/CodeGenData.cpp   | 30 ---
 llvm/lib/CGData/CodeGenDataReader.cpp | 63 +-
 llvm/lib/CGData/CodeGenDataWriter.cpp | 30 ++-
 llvm/test/tools/llvm-cgdata/empty.test|  8 +-
 llvm/test/tools/llvm-cgdata/error.test| 13 +--
 .../merge-combined-funcmap-hashtree.test  | 66 +++
 .../llvm-cgdata/merge-funcmap-archive.test| 83 +++
 .../llvm-cgdata/merge-funcmap-concat.test | 78 +
 .../llvm-cgdata/merge-funcmap-double.test | 79 ++
 .../llvm-cgdata/merge-funcmap-single.test | 36 
 ...chive.test => merge-hashtree-archive.test} |  8 +-
 ...concat.test => merge-hashtree-concat.test} |  6 +-
 ...double.test => merge-hashtree-double.test} |  8 +-
 ...single.test => merge-hashtree-single.test} |  4 +-
 llvm/tools/llvm-cgdata/llvm-cgdata.cpp| 48 ---
 21 files changed, 577 insertions(+), 87 deletions(-)
 create mode 100644 
llvm/test/tools/llvm-cgdata/merge-combined-funcmap-hashtree.test
 create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-archive.test
 create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-concat.test
 create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-double.test
 create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-single.test
 rename llvm/test/tools/llvm-cgdata/{merge-archive.test => 
merge-hashtree-archive.test} (91%)
 rename llvm/test/tools/llvm-cgdata/{merge-concat.test => 
merge-hashtree-concat.test} (93%)
 rename llvm/test/tools/llvm-cgdata/{merge-double.test => 
merge-hashtree-double.test} (90%)
 rename llvm/test/tools/llvm-cgdata/{merge-single.test => 
merge-hashtree-single.test} (92%)

diff --git a/lld/test/MachO/cgdata-generate.s b/lld/test/MachO/cgdata-generate.s
index 174df39d666c5d..f942ae07f64e0e 100644
--- a/lld/test/MachO/cgdata-generate.s
+++ b/lld/test/MachO/cgdata-generate.s
@@ -3,12 +3,12 @@
 
 # RUN: rm -rf %t; split-file %s %t
 
-# Synthesize raw cgdata without the header (24 byte) from the indexed cgdata.
+# Synthesize raw cgdata without the header (32 byte) from the indexed cgdata.
 # RUN: llvm-cgdata --convert --format binary %t/raw-1.cgtext -o %t/raw-1.cgdata
-# RUN: od -t x1 -j 24 -An %t/raw-1.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ 
/g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-1-bytes.txt
+# RUN: od -t x1 -j 32 -An %t/raw-1.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ 
/g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-1-bytes.txt
 # RUN: sed "s//$(cat %t/raw-1-bytes.txt)/g" %t/merge-template.s > 
%t/merge-1.s
 # RUN: llvm-cgdata --convert --format binary %t/raw-2.cgtext -o %t/raw-2.cgdata
-# RUN: od -t x1 -j 24 -An %t/raw-2.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ 
/g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-2-bytes.txt
+# RUN: od -t x1 -j 32 -An %t/raw-2.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ 
/g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-2-bytes.txt
 # RUN: sed "s//$(cat %t/raw-2-bytes.txt)/g" %t/merge-template.s > 
%t/merge-2.s
 
 # RUN: llvm-mc -filetype obj -triple arm64-apple-darwin %t/merge-1.s -o 
%t/merge-1.o
diff --git a/llvm/docs/CommandGuide/llvm-cgdata.rst 
b/llvm/docs/CommandGuide/llvm-cgdata.rst
index f592e1508844ee..0670decd087e39 100644
--- a/llvm/docs/CommandGuide/llvm-cgdata.rst
+++ b/llvm/docs/CommandGuide/llvm-cgdata.rst
@@ -11,15 +11,13 @@ SYNOPSIS
 DESCRIPTION
 ---
 
-The :program:llvm-cgdata utility parses raw codegen data embedded
-in compiled binary files and merges them into a single .cgdata file.
-It can also inspect and manipulate .cgdata files.
-Currently, the tool supports saving and restoring outlined hash trees,
-enabling global function outlining across modules, allowing for more
-efficient function outlining in subsequent compilations.
-The design is extensible, allowing for the incorporation of additional
-codegen summaries and optimization techniques, such as global function
-merging, in the futur

[llvm-branch-commits] [llvm] [CGData] Global Merge Functions (PR #112671)

2024-10-17 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com updated 
https://github.com/llvm/llvm-project/pull/112671

>From 584a2d7fdadf91838b7b305a1b09056fcb0e805f Mon Sep 17 00:00:00 2001
From: Kyungwoo Lee 
Date: Fri, 30 Aug 2024 00:09:09 -0700
Subject: [PATCH] [CGData] Global Merge Functions

---
 llvm/include/llvm/CGData/CodeGenData.h|  11 +
 llvm/include/llvm/InitializePasses.h  |   1 +
 llvm/include/llvm/LinkAllPasses.h |   1 +
 llvm/include/llvm/Passes/CodeGenPassBuilder.h |   1 +
 llvm/include/llvm/Transforms/IPO.h|   2 +
 .../Transforms/IPO/GlobalMergeFunctions.h |  73 ++
 llvm/lib/CodeGen/TargetPassConfig.cpp |   3 +
 llvm/lib/LTO/LTO.cpp  |   1 +
 llvm/lib/Transforms/IPO/CMakeLists.txt|   2 +
 .../Transforms/IPO/GlobalMergeFunctions.cpp   | 666 ++
 .../ThinLTO/AArch64/cgdata-merge-local.ll |  62 ++
 .../test/ThinLTO/AArch64/cgdata-merge-read.ll |  82 +++
 .../AArch64/cgdata-merge-two-rounds.ll|  68 ++
 .../ThinLTO/AArch64/cgdata-merge-write.ll |  97 +++
 llvm/tools/llvm-lto2/CMakeLists.txt   |   1 +
 llvm/tools/llvm-lto2/llvm-lto2.cpp|   6 +
 16 files changed, 1077 insertions(+)
 create mode 100644 llvm/include/llvm/Transforms/IPO/GlobalMergeFunctions.h
 create mode 100644 llvm/lib/Transforms/IPO/GlobalMergeFunctions.cpp
 create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-local.ll
 create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-read.ll
 create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-two-rounds.ll
 create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-write.ll

diff --git a/llvm/include/llvm/CGData/CodeGenData.h 
b/llvm/include/llvm/CGData/CodeGenData.h
index 5d7c74725ccef1..da0e412f2a0e03 100644
--- a/llvm/include/llvm/CGData/CodeGenData.h
+++ b/llvm/include/llvm/CGData/CodeGenData.h
@@ -145,6 +145,9 @@ class CodeGenData {
   const OutlinedHashTree *getOutlinedHashTree() {
 return PublishedHashTree.get();
   }
+  const StableFunctionMap *getStableFunctionMap() {
+return PublishedStableFunctionMap.get();
+  }
 
   /// Returns true if we should write codegen data.
   bool emitCGData() { return EmitCGData; }
@@ -169,10 +172,18 @@ inline bool hasOutlinedHashTree() {
   return CodeGenData::getInstance().hasOutlinedHashTree();
 }
 
+inline bool hasStableFunctionMap() {
+  return CodeGenData::getInstance().hasStableFunctionMap();
+}
+
 inline const OutlinedHashTree *getOutlinedHashTree() {
   return CodeGenData::getInstance().getOutlinedHashTree();
 }
 
+inline const StableFunctionMap *getStableFunctionMap() {
+  return CodeGenData::getInstance().getStableFunctionMap();
+}
+
 inline bool emitCGData() { return CodeGenData::getInstance().emitCGData(); }
 
 inline void
diff --git a/llvm/include/llvm/InitializePasses.h 
b/llvm/include/llvm/InitializePasses.h
index 4352099d6dbb99..9aa36d5bb7f801 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -123,6 +123,7 @@ void initializeGCEmptyBasicBlocksPass(PassRegistry &);
 void initializeGCMachineCodeAnalysisPass(PassRegistry &);
 void initializeGCModuleInfoPass(PassRegistry &);
 void initializeGVNLegacyPassPass(PassRegistry &);
+void initializeGlobalMergeFuncPass(PassRegistry &);
 void initializeGlobalMergePass(PassRegistry &);
 void initializeGlobalsAAWrapperPassPass(PassRegistry &);
 void initializeHardwareLoopsLegacyPass(PassRegistry &);
diff --git a/llvm/include/llvm/LinkAllPasses.h 
b/llvm/include/llvm/LinkAllPasses.h
index 92b59a66567c95..ea3609a2b4bc71 100644
--- a/llvm/include/llvm/LinkAllPasses.h
+++ b/llvm/include/llvm/LinkAllPasses.h
@@ -79,6 +79,7 @@ struct ForcePassLinking {
 (void)llvm::createDomOnlyViewerWrapperPassPass();
 (void)llvm::createDomViewerWrapperPassPass();
 (void)llvm::createAlwaysInlinerLegacyPass();
+(void)llvm::createGlobalMergeFuncPass();
 (void)llvm::createGlobalsAAWrapperPass();
 (void)llvm::createInstSimplifyLegacyPass();
 (void)llvm::createInstructionCombiningPass();
diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h 
b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
index 13bc4700d87029..96b5b815132bc0 100644
--- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h
+++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
@@ -74,6 +74,7 @@
 #include "llvm/Target/CGPassBuilderOption.h"
 #include "llvm/Target/TargetMachine.h"
 #include "llvm/Transforms/CFGuard.h"
+#include "llvm/Transforms/IPO/GlobalMergeFunctions.h"
 #include "llvm/Transforms/Scalar/ConstantHoisting.h"
 #include "llvm/Transforms/Scalar/LoopPassManager.h"
 #include "llvm/Transforms/Scalar/LoopStrengthReduce.h"
diff --git a/llvm/include/llvm/Transforms/IPO.h 
b/llvm/include/llvm/Transforms/IPO.h
index ee0e35aa618325..86a8654f56997c 100644
--- a/llvm/include/llvm/Transforms/IPO.h
+++ b/llvm/include/llvm/Transforms/IPO.h
@@ -55,6 +55,8 @@ enum class PassSummaryAction {
   Export, ///< Export information to summary.
 };
 
+Pass *createGlobalMergeFuncP

[llvm-branch-commits] [lld] [CGData][lld-macho] Add Global Merge Func Pass (PR #112674)

2024-10-17 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com updated 
https://github.com/llvm/llvm-project/pull/112674

>From ead1aee8eeb4046ec0641c09652cea726becd48a Mon Sep 17 00:00:00 2001
From: Kyungwoo Lee 
Date: Wed, 16 Oct 2024 22:56:38 -0700
Subject: [PATCH] [CGData][lld-macho] Add Global Merge Func Pass

---
 lld/MachO/CMakeLists.txt   |  2 +
 lld/MachO/Driver.cpp   | 18 +-
 lld/MachO/InputSection.h   |  1 +
 lld/MachO/LTO.cpp  |  7 +++
 lld/test/MachO/cgdata-generate-merge.s | 85 ++
 5 files changed, 112 insertions(+), 1 deletion(-)
 create mode 100644 lld/test/MachO/cgdata-generate-merge.s

diff --git a/lld/MachO/CMakeLists.txt b/lld/MachO/CMakeLists.txt
index ecf6ce609e59f2..137fe4939b4457 100644
--- a/lld/MachO/CMakeLists.txt
+++ b/lld/MachO/CMakeLists.txt
@@ -41,9 +41,11 @@ add_lld_library(lldMachO
   BitReader
   BitWriter
   CGData
+  CodeGen
   Core
   DebugInfoDWARF
   Demangle
+  IPO
   LTO
   MC
   ObjCARCOpts
diff --git a/lld/MachO/Driver.cpp b/lld/MachO/Driver.cpp
index ab4abb1fa97efc..59c24a06a2cb20 100644
--- a/lld/MachO/Driver.cpp
+++ b/lld/MachO/Driver.cpp
@@ -1326,7 +1326,8 @@ static void codegenDataGenerate() {
   TimeTraceScope timeScope("Generating codegen data");
 
   OutlinedHashTreeRecord globalOutlineRecord;
-  for (ConcatInputSection *isec : inputSections)
+  StableFunctionMapRecord globalMergeRecord;
+  for (ConcatInputSection *isec : inputSections) {
 if (isec->getSegName() == segment_names::data &&
 isec->getName() == section_names::outlinedHashTree) {
   // Read outlined hash tree from each section.
@@ -1337,10 +1338,25 @@ static void codegenDataGenerate() {
   // Merge it to the global hash tree.
   globalOutlineRecord.merge(localOutlineRecord);
 }
+if (isec->getSegName() == segment_names::data &&
+isec->getName() == section_names::functionmap) {
+  // Read stable functions from each section.
+  StableFunctionMapRecord localMergeRecord;
+  auto *data = isec->data.data();
+  localMergeRecord.deserialize(data);
+
+  // Merge it to the global function map.
+  globalMergeRecord.merge(localMergeRecord);
+}
+  }
+
+  globalMergeRecord.finalize();
 
   CodeGenDataWriter Writer;
   if (!globalOutlineRecord.empty())
 Writer.addRecord(globalOutlineRecord);
+  if (!globalMergeRecord.empty())
+Writer.addRecord(globalMergeRecord);
 
   std::error_code EC;
   auto fileName = config->codegenDataGeneratePath;
diff --git a/lld/MachO/InputSection.h b/lld/MachO/InputSection.h
index 7ef0e31066f372..b86520d36cda5b 100644
--- a/lld/MachO/InputSection.h
+++ b/lld/MachO/InputSection.h
@@ -339,6 +339,7 @@ constexpr const char const_[] = "__const";
 constexpr const char lazySymbolPtr[] = "__la_symbol_ptr";
 constexpr const char lazyBinding[] = "__lazy_binding";
 constexpr const char literals[] = "__literals";
+constexpr const char functionmap[] = "__llvm_merge";
 constexpr const char moduleInitFunc[] = "__mod_init_func";
 constexpr const char moduleTermFunc[] = "__mod_term_func";
 constexpr const char nonLazySymbolPtr[] = "__nl_symbol_ptr";
diff --git a/lld/MachO/LTO.cpp b/lld/MachO/LTO.cpp
index 28f5290edb58e3..9bddf9a6445f6d 100644
--- a/lld/MachO/LTO.cpp
+++ b/lld/MachO/LTO.cpp
@@ -25,6 +25,7 @@
 #include "llvm/Support/FileSystem.h"
 #include "llvm/Support/Path.h"
 #include "llvm/Support/raw_ostream.h"
+#include "llvm/Transforms/IPO.h"
 #include "llvm/Transforms/ObjCARC.h"
 
 using namespace lld;
@@ -38,6 +39,8 @@ static std::string getThinLTOOutputFile(StringRef modulePath) 
{
config->thinLTOPrefixReplaceNew);
 }
 
+extern cl::opt EnableGlobalMergeFunc;
+
 static lto::Config createConfig() {
   lto::Config c;
   c.Options = initTargetOptionsFromCodeGenFlags();
@@ -49,6 +52,10 @@ static lto::Config createConfig() {
   c.MAttrs = getMAttrs();
   c.DiagHandler = diagnosticHandler;
 
+  c.PreCodeGenPassesHook = [](legacy::PassManager &pm) {
+if (EnableGlobalMergeFunc)
+  pm.add(createGlobalMergeFuncPass());
+  };
   c.AlwaysEmitRegularLTOObj = !config->ltoObjPath.empty();
 
   c.TimeTraceEnabled = config->timeTraceEnabled;
diff --git a/lld/test/MachO/cgdata-generate-merge.s 
b/lld/test/MachO/cgdata-generate-merge.s
new file mode 100644
index 00..3f7fb6777bc3cf
--- /dev/null
+++ b/lld/test/MachO/cgdata-generate-merge.s
@@ -0,0 +1,85 @@
+# UNSUPPORTED: system-windows
+# REQUIRES: aarch64
+
+# RUN: rm -rf %t; split-file %s %t
+
+# Synthesize raw cgdata without the header (32 byte) from the indexed cgdata.
+# RUN: llvm-cgdata --convert --format binary %t/raw-1.cgtext -o %t/raw-1.cgdata
+# RUN: od -t x1 -j 32 -An %t/raw-1.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ 
/g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-1-bytes.txt
+# RUN: sed "s//$(cat %t/raw-1-bytes.txt)/g" %t/merge-template.s > 
%t/merge-1.s
+# RUN: llvm-cgdata --convert --format binary %t/raw-2.cgtext -o %t/raw-2.cgdata
+# RUN: od -t x1 -j 

[llvm-branch-commits] [lld] [CGData][lld-macho] Add Global Merge Func Pass (PR #112674)

2024-10-17 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com updated 
https://github.com/llvm/llvm-project/pull/112674

>From 549cf5d3880450641c720a6bc1f3bddae272f902 Mon Sep 17 00:00:00 2001
From: Kyungwoo Lee 
Date: Wed, 16 Oct 2024 22:56:38 -0700
Subject: [PATCH] [CGData][lld-macho] Add Global Merge Func Pass

---
 lld/MachO/CMakeLists.txt   |  2 +
 lld/MachO/Driver.cpp   | 18 +-
 lld/MachO/InputSection.h   |  1 +
 lld/MachO/LTO.cpp  |  7 +++
 lld/test/MachO/cgdata-generate-merge.s | 85 ++
 5 files changed, 112 insertions(+), 1 deletion(-)
 create mode 100644 lld/test/MachO/cgdata-generate-merge.s

diff --git a/lld/MachO/CMakeLists.txt b/lld/MachO/CMakeLists.txt
index ecf6ce609e59f2..137fe4939b4457 100644
--- a/lld/MachO/CMakeLists.txt
+++ b/lld/MachO/CMakeLists.txt
@@ -41,9 +41,11 @@ add_lld_library(lldMachO
   BitReader
   BitWriter
   CGData
+  CodeGen
   Core
   DebugInfoDWARF
   Demangle
+  IPO
   LTO
   MC
   ObjCARCOpts
diff --git a/lld/MachO/Driver.cpp b/lld/MachO/Driver.cpp
index ab4abb1fa97efc..59c24a06a2cb20 100644
--- a/lld/MachO/Driver.cpp
+++ b/lld/MachO/Driver.cpp
@@ -1326,7 +1326,8 @@ static void codegenDataGenerate() {
   TimeTraceScope timeScope("Generating codegen data");
 
   OutlinedHashTreeRecord globalOutlineRecord;
-  for (ConcatInputSection *isec : inputSections)
+  StableFunctionMapRecord globalMergeRecord;
+  for (ConcatInputSection *isec : inputSections) {
 if (isec->getSegName() == segment_names::data &&
 isec->getName() == section_names::outlinedHashTree) {
   // Read outlined hash tree from each section.
@@ -1337,10 +1338,25 @@ static void codegenDataGenerate() {
   // Merge it to the global hash tree.
   globalOutlineRecord.merge(localOutlineRecord);
 }
+if (isec->getSegName() == segment_names::data &&
+isec->getName() == section_names::functionmap) {
+  // Read stable functions from each section.
+  StableFunctionMapRecord localMergeRecord;
+  auto *data = isec->data.data();
+  localMergeRecord.deserialize(data);
+
+  // Merge it to the global function map.
+  globalMergeRecord.merge(localMergeRecord);
+}
+  }
+
+  globalMergeRecord.finalize();
 
   CodeGenDataWriter Writer;
   if (!globalOutlineRecord.empty())
 Writer.addRecord(globalOutlineRecord);
+  if (!globalMergeRecord.empty())
+Writer.addRecord(globalMergeRecord);
 
   std::error_code EC;
   auto fileName = config->codegenDataGeneratePath;
diff --git a/lld/MachO/InputSection.h b/lld/MachO/InputSection.h
index 7ef0e31066f372..b86520d36cda5b 100644
--- a/lld/MachO/InputSection.h
+++ b/lld/MachO/InputSection.h
@@ -339,6 +339,7 @@ constexpr const char const_[] = "__const";
 constexpr const char lazySymbolPtr[] = "__la_symbol_ptr";
 constexpr const char lazyBinding[] = "__lazy_binding";
 constexpr const char literals[] = "__literals";
+constexpr const char functionmap[] = "__llvm_merge";
 constexpr const char moduleInitFunc[] = "__mod_init_func";
 constexpr const char moduleTermFunc[] = "__mod_term_func";
 constexpr const char nonLazySymbolPtr[] = "__nl_symbol_ptr";
diff --git a/lld/MachO/LTO.cpp b/lld/MachO/LTO.cpp
index 28f5290edb58e3..9bddf9a6445f6d 100644
--- a/lld/MachO/LTO.cpp
+++ b/lld/MachO/LTO.cpp
@@ -25,6 +25,7 @@
 #include "llvm/Support/FileSystem.h"
 #include "llvm/Support/Path.h"
 #include "llvm/Support/raw_ostream.h"
+#include "llvm/Transforms/IPO.h"
 #include "llvm/Transforms/ObjCARC.h"
 
 using namespace lld;
@@ -38,6 +39,8 @@ static std::string getThinLTOOutputFile(StringRef modulePath) 
{
config->thinLTOPrefixReplaceNew);
 }
 
+extern cl::opt EnableGlobalMergeFunc;
+
 static lto::Config createConfig() {
   lto::Config c;
   c.Options = initTargetOptionsFromCodeGenFlags();
@@ -49,6 +52,10 @@ static lto::Config createConfig() {
   c.MAttrs = getMAttrs();
   c.DiagHandler = diagnosticHandler;
 
+  c.PreCodeGenPassesHook = [](legacy::PassManager &pm) {
+if (EnableGlobalMergeFunc)
+  pm.add(createGlobalMergeFuncPass());
+  };
   c.AlwaysEmitRegularLTOObj = !config->ltoObjPath.empty();
 
   c.TimeTraceEnabled = config->timeTraceEnabled;
diff --git a/lld/test/MachO/cgdata-generate-merge.s 
b/lld/test/MachO/cgdata-generate-merge.s
new file mode 100644
index 00..3f7fb6777bc3cf
--- /dev/null
+++ b/lld/test/MachO/cgdata-generate-merge.s
@@ -0,0 +1,85 @@
+# UNSUPPORTED: system-windows
+# REQUIRES: aarch64
+
+# RUN: rm -rf %t; split-file %s %t
+
+# Synthesize raw cgdata without the header (32 byte) from the indexed cgdata.
+# RUN: llvm-cgdata --convert --format binary %t/raw-1.cgtext -o %t/raw-1.cgdata
+# RUN: od -t x1 -j 32 -An %t/raw-1.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ 
/g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-1-bytes.txt
+# RUN: sed "s//$(cat %t/raw-1-bytes.txt)/g" %t/merge-template.s > 
%t/merge-1.s
+# RUN: llvm-cgdata --convert --format binary %t/raw-2.cgtext -o %t/raw-2.cgdata
+# RUN: od -t x1 -j 

[llvm-branch-commits] [llvm] [CGData] Global Merge Functions (PR #112671)

2024-10-17 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com edited 
https://github.com/llvm/llvm-project/pull/112671
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)

2024-10-17 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com edited 
https://github.com/llvm/llvm-project/pull/112638
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] [CGData][lld-macho] Add Global Merge Func Pass (PR #112674)

2024-10-17 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com edited 
https://github.com/llvm/llvm-project/pull/112674
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] [llvm] [CGData][llvm-cgdata] Support for stable function map (PR #112664)

2024-10-17 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com edited 
https://github.com/llvm/llvm-project/pull/112664
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)

2024-10-17 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com edited 
https://github.com/llvm/llvm-project/pull/112638
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [CGData] Stable Function Map (PR #112662)

2024-10-17 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com edited 
https://github.com/llvm/llvm-project/pull/112662
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] [llvm] [CGData][llvm-cgdata] Support for stable function map (PR #112664)

2024-10-29 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com updated 
https://github.com/llvm/llvm-project/pull/112664

>From c7913f9fff736da4cc6a78a17e41dc539bc75e8a Mon Sep 17 00:00:00 2001
From: Kyungwoo Lee 
Date: Mon, 9 Sep 2024 19:38:05 -0700
Subject: [PATCH 1/2] [CGData][llvm-cgdata] Support for stable function map

This introduces a new cgdata format for stable function maps.
The raw data is embedded in the __llvm_merge section during compile time.
This data can be read and merged using the llvm-cgdata tool, into an indexed 
cgdata file. Consequently, the tool is now capable of handling either outlined 
hash trees, stable function maps, or both, as they are orthogonal.
---
 lld/test/MachO/cgdata-generate.s  |  6 +-
 llvm/docs/CommandGuide/llvm-cgdata.rst| 16 ++--
 llvm/include/llvm/CGData/CodeGenData.h| 24 +-
 llvm/include/llvm/CGData/CodeGenData.inc  | 12 ++-
 llvm/include/llvm/CGData/CodeGenDataReader.h  | 29 ++-
 llvm/include/llvm/CGData/CodeGenDataWriter.h  | 17 +++-
 llvm/lib/CGData/CodeGenData.cpp   | 30 ---
 llvm/lib/CGData/CodeGenDataReader.cpp | 63 +-
 llvm/lib/CGData/CodeGenDataWriter.cpp | 30 ++-
 llvm/test/tools/llvm-cgdata/empty.test|  8 +-
 llvm/test/tools/llvm-cgdata/error.test| 13 +--
 .../merge-combined-funcmap-hashtree.test  | 66 +++
 .../llvm-cgdata/merge-funcmap-archive.test| 83 +++
 .../llvm-cgdata/merge-funcmap-concat.test | 78 +
 .../llvm-cgdata/merge-funcmap-double.test | 79 ++
 .../llvm-cgdata/merge-funcmap-single.test | 36 
 ...chive.test => merge-hashtree-archive.test} |  8 +-
 ...concat.test => merge-hashtree-concat.test} |  6 +-
 ...double.test => merge-hashtree-double.test} |  8 +-
 ...single.test => merge-hashtree-single.test} |  4 +-
 llvm/tools/llvm-cgdata/llvm-cgdata.cpp| 48 ---
 21 files changed, 577 insertions(+), 87 deletions(-)
 create mode 100644 
llvm/test/tools/llvm-cgdata/merge-combined-funcmap-hashtree.test
 create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-archive.test
 create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-concat.test
 create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-double.test
 create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-single.test
 rename llvm/test/tools/llvm-cgdata/{merge-archive.test => 
merge-hashtree-archive.test} (91%)
 rename llvm/test/tools/llvm-cgdata/{merge-concat.test => 
merge-hashtree-concat.test} (93%)
 rename llvm/test/tools/llvm-cgdata/{merge-double.test => 
merge-hashtree-double.test} (90%)
 rename llvm/test/tools/llvm-cgdata/{merge-single.test => 
merge-hashtree-single.test} (92%)

diff --git a/lld/test/MachO/cgdata-generate.s b/lld/test/MachO/cgdata-generate.s
index 174df39d666c5d..f942ae07f64e0e 100644
--- a/lld/test/MachO/cgdata-generate.s
+++ b/lld/test/MachO/cgdata-generate.s
@@ -3,12 +3,12 @@
 
 # RUN: rm -rf %t; split-file %s %t
 
-# Synthesize raw cgdata without the header (24 byte) from the indexed cgdata.
+# Synthesize raw cgdata without the header (32 byte) from the indexed cgdata.
 # RUN: llvm-cgdata --convert --format binary %t/raw-1.cgtext -o %t/raw-1.cgdata
-# RUN: od -t x1 -j 24 -An %t/raw-1.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ 
/g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-1-bytes.txt
+# RUN: od -t x1 -j 32 -An %t/raw-1.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ 
/g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-1-bytes.txt
 # RUN: sed "s//$(cat %t/raw-1-bytes.txt)/g" %t/merge-template.s > 
%t/merge-1.s
 # RUN: llvm-cgdata --convert --format binary %t/raw-2.cgtext -o %t/raw-2.cgdata
-# RUN: od -t x1 -j 24 -An %t/raw-2.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ 
/g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-2-bytes.txt
+# RUN: od -t x1 -j 32 -An %t/raw-2.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ 
/g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-2-bytes.txt
 # RUN: sed "s//$(cat %t/raw-2-bytes.txt)/g" %t/merge-template.s > 
%t/merge-2.s
 
 # RUN: llvm-mc -filetype obj -triple arm64-apple-darwin %t/merge-1.s -o 
%t/merge-1.o
diff --git a/llvm/docs/CommandGuide/llvm-cgdata.rst 
b/llvm/docs/CommandGuide/llvm-cgdata.rst
index f592e1508844ee..0670decd087e39 100644
--- a/llvm/docs/CommandGuide/llvm-cgdata.rst
+++ b/llvm/docs/CommandGuide/llvm-cgdata.rst
@@ -11,15 +11,13 @@ SYNOPSIS
 DESCRIPTION
 ---
 
-The :program:llvm-cgdata utility parses raw codegen data embedded
-in compiled binary files and merges them into a single .cgdata file.
-It can also inspect and manipulate .cgdata files.
-Currently, the tool supports saving and restoring outlined hash trees,
-enabling global function outlining across modules, allowing for more
-efficient function outlining in subsequent compilations.
-The design is extensible, allowing for the incorporation of additional
-codegen summaries and optimization techniques, such as global function
-merging, in the f

[llvm-branch-commits] [lld] [llvm] [CGData][llvm-cgdata] Support for stable function map (PR #112664)

2024-10-28 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com ready_for_review 
https://github.com/llvm/llvm-project/pull/112664
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] [llvm] [CGData][llvm-cgdata] Support for stable function map (PR #112664)

2024-10-27 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com edited 
https://github.com/llvm/llvm-project/pull/112664
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] [llvm] [CGData][llvm-cgdata] Support for stable function map (PR #112664)

2024-10-27 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com updated 
https://github.com/llvm/llvm-project/pull/112664

>From c7913f9fff736da4cc6a78a17e41dc539bc75e8a Mon Sep 17 00:00:00 2001
From: Kyungwoo Lee 
Date: Mon, 9 Sep 2024 19:38:05 -0700
Subject: [PATCH] [CGData][llvm-cgdata] Support for stable function map

This introduces a new cgdata format for stable function maps.
The raw data is embedded in the __llvm_merge section during compile time.
This data can be read and merged using the llvm-cgdata tool, into an indexed 
cgdata file. Consequently, the tool is now capable of handling either outlined 
hash trees, stable function maps, or both, as they are orthogonal.
---
 lld/test/MachO/cgdata-generate.s  |  6 +-
 llvm/docs/CommandGuide/llvm-cgdata.rst| 16 ++--
 llvm/include/llvm/CGData/CodeGenData.h| 24 +-
 llvm/include/llvm/CGData/CodeGenData.inc  | 12 ++-
 llvm/include/llvm/CGData/CodeGenDataReader.h  | 29 ++-
 llvm/include/llvm/CGData/CodeGenDataWriter.h  | 17 +++-
 llvm/lib/CGData/CodeGenData.cpp   | 30 ---
 llvm/lib/CGData/CodeGenDataReader.cpp | 63 +-
 llvm/lib/CGData/CodeGenDataWriter.cpp | 30 ++-
 llvm/test/tools/llvm-cgdata/empty.test|  8 +-
 llvm/test/tools/llvm-cgdata/error.test| 13 +--
 .../merge-combined-funcmap-hashtree.test  | 66 +++
 .../llvm-cgdata/merge-funcmap-archive.test| 83 +++
 .../llvm-cgdata/merge-funcmap-concat.test | 78 +
 .../llvm-cgdata/merge-funcmap-double.test | 79 ++
 .../llvm-cgdata/merge-funcmap-single.test | 36 
 ...chive.test => merge-hashtree-archive.test} |  8 +-
 ...concat.test => merge-hashtree-concat.test} |  6 +-
 ...double.test => merge-hashtree-double.test} |  8 +-
 ...single.test => merge-hashtree-single.test} |  4 +-
 llvm/tools/llvm-cgdata/llvm-cgdata.cpp| 48 ---
 21 files changed, 577 insertions(+), 87 deletions(-)
 create mode 100644 
llvm/test/tools/llvm-cgdata/merge-combined-funcmap-hashtree.test
 create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-archive.test
 create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-concat.test
 create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-double.test
 create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-single.test
 rename llvm/test/tools/llvm-cgdata/{merge-archive.test => 
merge-hashtree-archive.test} (91%)
 rename llvm/test/tools/llvm-cgdata/{merge-concat.test => 
merge-hashtree-concat.test} (93%)
 rename llvm/test/tools/llvm-cgdata/{merge-double.test => 
merge-hashtree-double.test} (90%)
 rename llvm/test/tools/llvm-cgdata/{merge-single.test => 
merge-hashtree-single.test} (92%)

diff --git a/lld/test/MachO/cgdata-generate.s b/lld/test/MachO/cgdata-generate.s
index 174df39d666c5d..f942ae07f64e0e 100644
--- a/lld/test/MachO/cgdata-generate.s
+++ b/lld/test/MachO/cgdata-generate.s
@@ -3,12 +3,12 @@
 
 # RUN: rm -rf %t; split-file %s %t
 
-# Synthesize raw cgdata without the header (24 byte) from the indexed cgdata.
+# Synthesize raw cgdata without the header (32 byte) from the indexed cgdata.
 # RUN: llvm-cgdata --convert --format binary %t/raw-1.cgtext -o %t/raw-1.cgdata
-# RUN: od -t x1 -j 24 -An %t/raw-1.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ 
/g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-1-bytes.txt
+# RUN: od -t x1 -j 32 -An %t/raw-1.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ 
/g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-1-bytes.txt
 # RUN: sed "s//$(cat %t/raw-1-bytes.txt)/g" %t/merge-template.s > 
%t/merge-1.s
 # RUN: llvm-cgdata --convert --format binary %t/raw-2.cgtext -o %t/raw-2.cgdata
-# RUN: od -t x1 -j 24 -An %t/raw-2.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ 
/g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-2-bytes.txt
+# RUN: od -t x1 -j 32 -An %t/raw-2.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ 
/g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-2-bytes.txt
 # RUN: sed "s//$(cat %t/raw-2-bytes.txt)/g" %t/merge-template.s > 
%t/merge-2.s
 
 # RUN: llvm-mc -filetype obj -triple arm64-apple-darwin %t/merge-1.s -o 
%t/merge-1.o
diff --git a/llvm/docs/CommandGuide/llvm-cgdata.rst 
b/llvm/docs/CommandGuide/llvm-cgdata.rst
index f592e1508844ee..0670decd087e39 100644
--- a/llvm/docs/CommandGuide/llvm-cgdata.rst
+++ b/llvm/docs/CommandGuide/llvm-cgdata.rst
@@ -11,15 +11,13 @@ SYNOPSIS
 DESCRIPTION
 ---
 
-The :program:llvm-cgdata utility parses raw codegen data embedded
-in compiled binary files and merges them into a single .cgdata file.
-It can also inspect and manipulate .cgdata files.
-Currently, the tool supports saving and restoring outlined hash trees,
-enabling global function outlining across modules, allowing for more
-efficient function outlining in subsequent compilations.
-The design is extensible, allowing for the incorporation of additional
-codegen summaries and optimization techniques, such as global function
-merging, in the futur

[llvm-branch-commits] [lld] [llvm] [CGData][llvm-cgdata] Support for stable function map (PR #112664)

2024-10-28 Thread Kyungwoo Lee via llvm-branch-commits

kyulee-com wrote:

cc. @nocchijiang 

https://github.com/llvm/llvm-project/pull/112664
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [CGData] Refactor Global Merge Functions (PR #115750)

2024-11-11 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com created 
https://github.com/llvm/llvm-project/pull/115750

None

>From 70dcb2ccba98b392c3539f349ccf7fec284a674c Mon Sep 17 00:00:00 2001
From: Kyungwoo Lee 
Date: Mon, 11 Nov 2024 10:06:56 -0800
Subject: [PATCH] [CGData] Refactor Global Merge Functions

---
 llvm/lib/CodeGen/GlobalMergeFunctions.cpp | 148 +-
 1 file changed, 59 insertions(+), 89 deletions(-)

diff --git a/llvm/lib/CodeGen/GlobalMergeFunctions.cpp 
b/llvm/lib/CodeGen/GlobalMergeFunctions.cpp
index 2b367ca87d9008..df8dbb8a73b95d 100644
--- a/llvm/lib/CodeGen/GlobalMergeFunctions.cpp
+++ b/llvm/lib/CodeGen/GlobalMergeFunctions.cpp
@@ -31,14 +31,6 @@ static cl::opt DisableCGDataForMerging(
  "merging is still enabled within a module."),
 cl::init(false));
 
-STATISTIC(NumMismatchedFunctionHash,
-  "Number of mismatched function hash for global merge function");
-STATISTIC(NumMismatchedInstCount,
-  "Number of mismatched instruction count for global merge function");
-STATISTIC(NumMismatchedConstHash,
-  "Number of mismatched const hash for global merge function");
-STATISTIC(NumMismatchedModuleId,
-  "Number of mismatched Module Id for global merge function");
 STATISTIC(NumMergedFunctions,
   "Number of functions that are actually merged using function hash");
 STATISTIC(NumAnalyzedModues, "Number of modules that are analyzed");
@@ -203,9 +195,9 @@ void GlobalMergeFunc::analyze(Module &M) {
 struct FuncMergeInfo {
   StableFunctionMap::StableFunctionEntry *SF;
   Function *F;
-  std::unique_ptr IndexInstruction;
+  IndexInstrMap *IndexInstruction;
   FuncMergeInfo(StableFunctionMap::StableFunctionEntry *SF, Function *F,
-std::unique_ptr IndexInstruction)
+IndexInstrMap *IndexInstruction)
   : SF(SF), F(F), IndexInstruction(std::move(IndexInstruction)) {}
 };
 
@@ -420,101 +412,79 @@ static ParamLocsVecTy computeParamInfo(
 bool GlobalMergeFunc::merge(Module &M, const StableFunctionMap *FunctionMap) {
   bool Changed = false;
 
-  // Build a map from stable function name to function.
-  StringMap StableNameToFuncMap;
-  for (auto &F : M)
-StableNameToFuncMap[get_stable_name(F.getName())] = &F;
-  // Track merged functions
-  DenseSet MergedFunctions;
-
-  auto ModId = M.getModuleIdentifier();
-  for (auto &[Hash, SFS] : FunctionMap->getFunctionMap()) {
-// Parameter locations based on the unique hash sequences
-// across the candidates.
+  // Collect stable functions related to the current module.
+  DenseMap> HashToFuncs;
+  DenseMap FuncToFI;
+  auto &Maps = FunctionMap->getFunctionMap();
+  for (auto &F : M) {
+if (!isEligibleFunction(&F))
+  continue;
+auto FI = llvm::StructuralHashWithDifferences(F, ignoreOp);
+if (Maps.contains(FI.FunctionHash)) {
+  HashToFuncs[FI.FunctionHash].push_back(&F);
+  FuncToFI.try_emplace(&F, std::move(FI));
+}
+  }
+
+  for (auto &[Hash, Funcs] : HashToFuncs) {
 std::optional ParamLocsVec;
-Function *MergedFunc = nullptr;
-std::string MergedModId;
 SmallVector FuncMergeInfos;
-for (auto &SF : SFS) {
-  // Get the function from the stable name.
-  auto I = StableNameToFuncMap.find(
-  *FunctionMap->getNameForId(SF->FunctionNameId));
-  if (I == StableNameToFuncMap.end())
-continue;
-  Function *F = I->second;
-  assert(F);
-  // Skip if the function has been merged before.
-  if (MergedFunctions.count(F))
-continue;
-  // Consider the function if it is eligible for merging.
-  if (!isEligibleFunction(F))
-continue;
 
-  auto FI = llvm::StructuralHashWithDifferences(*F, ignoreOp);
-  uint64_t FuncHash = FI.FunctionHash;
-  if (Hash != FuncHash) {
-++NumMismatchedFunctionHash;
-continue;
-  }
+// Iterate functions with the same hash.
+for (auto &F : Funcs) {
+  auto &SFS = Maps.at(Hash);
+  auto &FI = FuncToFI.at(F);
 
-  if (SF->InstCount != FI.IndexInstruction->size()) {
-++NumMismatchedInstCount;
+  // Check if the function is compatible with any stable function
+  // in terms of the number of instructions and ignored operands.
+  assert(!SFS.empty());
+  auto &RFS = SFS[0];
+  if (RFS->InstCount != FI.IndexInstruction->size())
 continue;
-  }
-  bool HasValidSharedConst = true;
-  for (auto &[Index, Hash] : *SF->IndexOperandHashMap) {
-auto [InstIndex, OpndIndex] = Index;
-assert(InstIndex < FI.IndexInstruction->size());
-auto *Inst = FI.IndexInstruction->lookup(InstIndex);
-if (!ignoreOp(Inst, OpndIndex)) {
-  HasValidSharedConst = false;
-  break;
-}
-  }
-  if (!HasValidSharedConst) {
-++NumMismatchedConstHash;
-continue;
-  }
-  if (!checkConstHashCompatible(*SF->IndexOperandHashMap,
-*FI.IndexOperandHashMap)) {
-

[llvm-branch-commits] [llvm] [CGData] Refactor Global Merge Functions (PR #115750)

2024-11-11 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com edited 
https://github.com/llvm/llvm-project/pull/115750
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [CGData] Refactor Global Merge Functions (PR #115750)

2024-11-11 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com updated 
https://github.com/llvm/llvm-project/pull/115750

>From 70dcb2ccba98b392c3539f349ccf7fec284a674c Mon Sep 17 00:00:00 2001
From: Kyungwoo Lee 
Date: Mon, 11 Nov 2024 10:06:56 -0800
Subject: [PATCH 1/2] [CGData] Refactor Global Merge Functions

---
 llvm/lib/CodeGen/GlobalMergeFunctions.cpp | 148 +-
 1 file changed, 59 insertions(+), 89 deletions(-)

diff --git a/llvm/lib/CodeGen/GlobalMergeFunctions.cpp 
b/llvm/lib/CodeGen/GlobalMergeFunctions.cpp
index 2b367ca87d9008..df8dbb8a73b95d 100644
--- a/llvm/lib/CodeGen/GlobalMergeFunctions.cpp
+++ b/llvm/lib/CodeGen/GlobalMergeFunctions.cpp
@@ -31,14 +31,6 @@ static cl::opt DisableCGDataForMerging(
  "merging is still enabled within a module."),
 cl::init(false));
 
-STATISTIC(NumMismatchedFunctionHash,
-  "Number of mismatched function hash for global merge function");
-STATISTIC(NumMismatchedInstCount,
-  "Number of mismatched instruction count for global merge function");
-STATISTIC(NumMismatchedConstHash,
-  "Number of mismatched const hash for global merge function");
-STATISTIC(NumMismatchedModuleId,
-  "Number of mismatched Module Id for global merge function");
 STATISTIC(NumMergedFunctions,
   "Number of functions that are actually merged using function hash");
 STATISTIC(NumAnalyzedModues, "Number of modules that are analyzed");
@@ -203,9 +195,9 @@ void GlobalMergeFunc::analyze(Module &M) {
 struct FuncMergeInfo {
   StableFunctionMap::StableFunctionEntry *SF;
   Function *F;
-  std::unique_ptr IndexInstruction;
+  IndexInstrMap *IndexInstruction;
   FuncMergeInfo(StableFunctionMap::StableFunctionEntry *SF, Function *F,
-std::unique_ptr IndexInstruction)
+IndexInstrMap *IndexInstruction)
   : SF(SF), F(F), IndexInstruction(std::move(IndexInstruction)) {}
 };
 
@@ -420,101 +412,79 @@ static ParamLocsVecTy computeParamInfo(
 bool GlobalMergeFunc::merge(Module &M, const StableFunctionMap *FunctionMap) {
   bool Changed = false;
 
-  // Build a map from stable function name to function.
-  StringMap StableNameToFuncMap;
-  for (auto &F : M)
-StableNameToFuncMap[get_stable_name(F.getName())] = &F;
-  // Track merged functions
-  DenseSet MergedFunctions;
-
-  auto ModId = M.getModuleIdentifier();
-  for (auto &[Hash, SFS] : FunctionMap->getFunctionMap()) {
-// Parameter locations based on the unique hash sequences
-// across the candidates.
+  // Collect stable functions related to the current module.
+  DenseMap> HashToFuncs;
+  DenseMap FuncToFI;
+  auto &Maps = FunctionMap->getFunctionMap();
+  for (auto &F : M) {
+if (!isEligibleFunction(&F))
+  continue;
+auto FI = llvm::StructuralHashWithDifferences(F, ignoreOp);
+if (Maps.contains(FI.FunctionHash)) {
+  HashToFuncs[FI.FunctionHash].push_back(&F);
+  FuncToFI.try_emplace(&F, std::move(FI));
+}
+  }
+
+  for (auto &[Hash, Funcs] : HashToFuncs) {
 std::optional ParamLocsVec;
-Function *MergedFunc = nullptr;
-std::string MergedModId;
 SmallVector FuncMergeInfos;
-for (auto &SF : SFS) {
-  // Get the function from the stable name.
-  auto I = StableNameToFuncMap.find(
-  *FunctionMap->getNameForId(SF->FunctionNameId));
-  if (I == StableNameToFuncMap.end())
-continue;
-  Function *F = I->second;
-  assert(F);
-  // Skip if the function has been merged before.
-  if (MergedFunctions.count(F))
-continue;
-  // Consider the function if it is eligible for merging.
-  if (!isEligibleFunction(F))
-continue;
 
-  auto FI = llvm::StructuralHashWithDifferences(*F, ignoreOp);
-  uint64_t FuncHash = FI.FunctionHash;
-  if (Hash != FuncHash) {
-++NumMismatchedFunctionHash;
-continue;
-  }
+// Iterate functions with the same hash.
+for (auto &F : Funcs) {
+  auto &SFS = Maps.at(Hash);
+  auto &FI = FuncToFI.at(F);
 
-  if (SF->InstCount != FI.IndexInstruction->size()) {
-++NumMismatchedInstCount;
+  // Check if the function is compatible with any stable function
+  // in terms of the number of instructions and ignored operands.
+  assert(!SFS.empty());
+  auto &RFS = SFS[0];
+  if (RFS->InstCount != FI.IndexInstruction->size())
 continue;
-  }
-  bool HasValidSharedConst = true;
-  for (auto &[Index, Hash] : *SF->IndexOperandHashMap) {
-auto [InstIndex, OpndIndex] = Index;
-assert(InstIndex < FI.IndexInstruction->size());
-auto *Inst = FI.IndexInstruction->lookup(InstIndex);
-if (!ignoreOp(Inst, OpndIndex)) {
-  HasValidSharedConst = false;
-  break;
-}
-  }
-  if (!HasValidSharedConst) {
-++NumMismatchedConstHash;
-continue;
-  }
-  if (!checkConstHashCompatible(*SF->IndexOperandHashMap,
-*FI.IndexOperandHashMap)) {
-  

[llvm-branch-commits] [llvm] [CGData] Refactor Global Merge Functions (PR #115750)

2024-11-11 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com edited 
https://github.com/llvm/llvm-project/pull/115750
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [CGData] Refactor Global Merge Functions (PR #115750)

2024-11-11 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com ready_for_review 
https://github.com/llvm/llvm-project/pull/115750
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [CGData] Refactor Global Merge Functions (PR #115750)

2024-11-11 Thread Kyungwoo Lee via llvm-branch-commits


@@ -420,101 +412,79 @@ static ParamLocsVecTy computeParamInfo(
 bool GlobalMergeFunc::merge(Module &M, const StableFunctionMap *FunctionMap) {
   bool Changed = false;
 
-  // Build a map from stable function name to function.
-  StringMap StableNameToFuncMap;
-  for (auto &F : M)
-StableNameToFuncMap[get_stable_name(F.getName())] = &F;
-  // Track merged functions
-  DenseSet MergedFunctions;
-
-  auto ModId = M.getModuleIdentifier();
-  for (auto &[Hash, SFS] : FunctionMap->getFunctionMap()) {
-// Parameter locations based on the unique hash sequences
-// across the candidates.
+  // Collect stable functions related to the current module.
+  DenseMap> HashToFuncs;
+  DenseMap FuncToFI;
+  auto &Maps = FunctionMap->getFunctionMap();
+  for (auto &F : M) {
+if (!isEligibleFunction(&F))
+  continue;
+auto FI = llvm::StructuralHashWithDifferences(F, ignoreOp);
+if (Maps.contains(FI.FunctionHash)) {
+  HashToFuncs[FI.FunctionHash].push_back(&F);
+  FuncToFI.try_emplace(&F, std::move(FI));
+}
+  }
+
+  for (auto &[Hash, Funcs] : HashToFuncs) {
 std::optional ParamLocsVec;
-Function *MergedFunc = nullptr;
-std::string MergedModId;
 SmallVector FuncMergeInfos;
-for (auto &SF : SFS) {
-  // Get the function from the stable name.
-  auto I = StableNameToFuncMap.find(
-  *FunctionMap->getNameForId(SF->FunctionNameId));
-  if (I == StableNameToFuncMap.end())
-continue;
-  Function *F = I->second;
-  assert(F);
-  // Skip if the function has been merged before.
-  if (MergedFunctions.count(F))
-continue;
-  // Consider the function if it is eligible for merging.
-  if (!isEligibleFunction(F))
-continue;
 
-  auto FI = llvm::StructuralHashWithDifferences(*F, ignoreOp);
-  uint64_t FuncHash = FI.FunctionHash;
-  if (Hash != FuncHash) {
-++NumMismatchedFunctionHash;
-continue;
-  }
+// Iterate functions with the same hash.
+for (auto &F : Funcs) {
+  auto &SFS = Maps.at(Hash);
+  auto &FI = FuncToFI.at(F);
 
-  if (SF->InstCount != FI.IndexInstruction->size()) {
-++NumMismatchedInstCount;
+  // Check if the function is compatible with any stable function
+  // in terms of the number of instructions and ignored operands.
+  assert(!SFS.empty());
+  auto &RFS = SFS[0];

kyulee-com wrote:

If the codegen data is not stale —meaning there has been no source change in 
the first or second pass— the size regression is less of a concern. My main 
worry was about the stability of optimization when the codegen data becomes 
outdated.

https://github.com/llvm/llvm-project/pull/115750
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [CGData] Refactor Global Merge Functions (PR #115750)

2024-11-11 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com updated 
https://github.com/llvm/llvm-project/pull/115750

>From 70dcb2ccba98b392c3539f349ccf7fec284a674c Mon Sep 17 00:00:00 2001
From: Kyungwoo Lee 
Date: Mon, 11 Nov 2024 10:06:56 -0800
Subject: [PATCH 1/2] [CGData] Refactor Global Merge Functions

---
 llvm/lib/CodeGen/GlobalMergeFunctions.cpp | 148 +-
 1 file changed, 59 insertions(+), 89 deletions(-)

diff --git a/llvm/lib/CodeGen/GlobalMergeFunctions.cpp 
b/llvm/lib/CodeGen/GlobalMergeFunctions.cpp
index 2b367ca87d9008..df8dbb8a73b95d 100644
--- a/llvm/lib/CodeGen/GlobalMergeFunctions.cpp
+++ b/llvm/lib/CodeGen/GlobalMergeFunctions.cpp
@@ -31,14 +31,6 @@ static cl::opt DisableCGDataForMerging(
  "merging is still enabled within a module."),
 cl::init(false));
 
-STATISTIC(NumMismatchedFunctionHash,
-  "Number of mismatched function hash for global merge function");
-STATISTIC(NumMismatchedInstCount,
-  "Number of mismatched instruction count for global merge function");
-STATISTIC(NumMismatchedConstHash,
-  "Number of mismatched const hash for global merge function");
-STATISTIC(NumMismatchedModuleId,
-  "Number of mismatched Module Id for global merge function");
 STATISTIC(NumMergedFunctions,
   "Number of functions that are actually merged using function hash");
 STATISTIC(NumAnalyzedModues, "Number of modules that are analyzed");
@@ -203,9 +195,9 @@ void GlobalMergeFunc::analyze(Module &M) {
 struct FuncMergeInfo {
   StableFunctionMap::StableFunctionEntry *SF;
   Function *F;
-  std::unique_ptr IndexInstruction;
+  IndexInstrMap *IndexInstruction;
   FuncMergeInfo(StableFunctionMap::StableFunctionEntry *SF, Function *F,
-std::unique_ptr IndexInstruction)
+IndexInstrMap *IndexInstruction)
   : SF(SF), F(F), IndexInstruction(std::move(IndexInstruction)) {}
 };
 
@@ -420,101 +412,79 @@ static ParamLocsVecTy computeParamInfo(
 bool GlobalMergeFunc::merge(Module &M, const StableFunctionMap *FunctionMap) {
   bool Changed = false;
 
-  // Build a map from stable function name to function.
-  StringMap StableNameToFuncMap;
-  for (auto &F : M)
-StableNameToFuncMap[get_stable_name(F.getName())] = &F;
-  // Track merged functions
-  DenseSet MergedFunctions;
-
-  auto ModId = M.getModuleIdentifier();
-  for (auto &[Hash, SFS] : FunctionMap->getFunctionMap()) {
-// Parameter locations based on the unique hash sequences
-// across the candidates.
+  // Collect stable functions related to the current module.
+  DenseMap> HashToFuncs;
+  DenseMap FuncToFI;
+  auto &Maps = FunctionMap->getFunctionMap();
+  for (auto &F : M) {
+if (!isEligibleFunction(&F))
+  continue;
+auto FI = llvm::StructuralHashWithDifferences(F, ignoreOp);
+if (Maps.contains(FI.FunctionHash)) {
+  HashToFuncs[FI.FunctionHash].push_back(&F);
+  FuncToFI.try_emplace(&F, std::move(FI));
+}
+  }
+
+  for (auto &[Hash, Funcs] : HashToFuncs) {
 std::optional ParamLocsVec;
-Function *MergedFunc = nullptr;
-std::string MergedModId;
 SmallVector FuncMergeInfos;
-for (auto &SF : SFS) {
-  // Get the function from the stable name.
-  auto I = StableNameToFuncMap.find(
-  *FunctionMap->getNameForId(SF->FunctionNameId));
-  if (I == StableNameToFuncMap.end())
-continue;
-  Function *F = I->second;
-  assert(F);
-  // Skip if the function has been merged before.
-  if (MergedFunctions.count(F))
-continue;
-  // Consider the function if it is eligible for merging.
-  if (!isEligibleFunction(F))
-continue;
 
-  auto FI = llvm::StructuralHashWithDifferences(*F, ignoreOp);
-  uint64_t FuncHash = FI.FunctionHash;
-  if (Hash != FuncHash) {
-++NumMismatchedFunctionHash;
-continue;
-  }
+// Iterate functions with the same hash.
+for (auto &F : Funcs) {
+  auto &SFS = Maps.at(Hash);
+  auto &FI = FuncToFI.at(F);
 
-  if (SF->InstCount != FI.IndexInstruction->size()) {
-++NumMismatchedInstCount;
+  // Check if the function is compatible with any stable function
+  // in terms of the number of instructions and ignored operands.
+  assert(!SFS.empty());
+  auto &RFS = SFS[0];
+  if (RFS->InstCount != FI.IndexInstruction->size())
 continue;
-  }
-  bool HasValidSharedConst = true;
-  for (auto &[Index, Hash] : *SF->IndexOperandHashMap) {
-auto [InstIndex, OpndIndex] = Index;
-assert(InstIndex < FI.IndexInstruction->size());
-auto *Inst = FI.IndexInstruction->lookup(InstIndex);
-if (!ignoreOp(Inst, OpndIndex)) {
-  HasValidSharedConst = false;
-  break;
-}
-  }
-  if (!HasValidSharedConst) {
-++NumMismatchedConstHash;
-continue;
-  }
-  if (!checkConstHashCompatible(*SF->IndexOperandHashMap,
-*FI.IndexOperandHashMap)) {
-  

[llvm-branch-commits] [llvm] [CGData] Refactor Global Merge Functions (PR #115750)

2024-11-13 Thread Kyungwoo Lee via llvm-branch-commits

kyulee-com wrote:

@nocchijiang  The new approach seems to be functioning well and is similar in 
size to the previous method.
I suspect that the no-LTO case might still encounter some slowdown, as each CU 
needs to read the entire CGData regardless. Currently, the CGData used for this 
merging process does not utilize names, which means we could potentially 
eliminate strings or make them optional. Alternatively, we could restructure 
the indexed CGData to allow for reading only the relevant hash entries on 
demand. I'd like to leave these options open for now, and if you can continue 
to improve it, that would be excellent.

https://github.com/llvm/llvm-project/pull/115750
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [CGData] Refactor Global Merge Functions (PR #115750)

2024-11-13 Thread Kyungwoo Lee via llvm-branch-commits

kyulee-com wrote:

@nocchijiang  The new approach seems to be functioning well and is similar in 
size to the previous method.
I suspect that the no-LTO case might still encounter some slowdown, as each CU 
needs to read the entire CGData regardless. Currently, the CGData used for this 
merging process does not utilize names, which means we could potentially 
eliminate strings or make them optional. Alternatively, we could restructure 
the indexed CGData to allow for reading only the relevant hash entries on 
demand. I'd like to leave these options open for now, and if you can continue 
to improve it, that would be excellent.

https://github.com/llvm/llvm-project/pull/115750
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [CGData] Refactor Global Merge Functions (PR #115750)

2024-11-12 Thread Kyungwoo Lee via llvm-branch-commits

kyulee-com wrote:

> Do we know why `OpIdx` is 4 here? This is confusing to me because it looks 
> like there is only one argument, `%5`.

The `ignoreOp` function was initially designed for use with 
`llvm::StructuralHashWithDifferences`, where it iterates over operands within 
the same instruction. In this context, `OpIdx` is always within the valid range 
for the specified instruction.
However, we now also utilize this function to determine if a particular operand 
can be ignored in certain instructions during this merge operation, as matched 
in the stable function summary— see  `hasValidSharedConst()`  for its use. So, 
there may be cases where an out-of-range index is passed from a different 
instruction context (although the entire function hash is matched). In this 
case, we should simply return false, as the target operand is not an 
interesting operand (that can be ignored/parameterized for merging).


https://github.com/llvm/llvm-project/pull/115750
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [CGData] Refactor Global Merge Functions (PR #115750)

2024-11-12 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com edited 
https://github.com/llvm/llvm-project/pull/115750
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [CGData] Refactor Global Merge Functions (PR #115750)

2024-11-12 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com updated 
https://github.com/llvm/llvm-project/pull/115750

>From 70dcb2ccba98b392c3539f349ccf7fec284a674c Mon Sep 17 00:00:00 2001
From: Kyungwoo Lee 
Date: Mon, 11 Nov 2024 10:06:56 -0800
Subject: [PATCH 1/3] [CGData] Refactor Global Merge Functions

---
 llvm/lib/CodeGen/GlobalMergeFunctions.cpp | 148 +-
 1 file changed, 59 insertions(+), 89 deletions(-)

diff --git a/llvm/lib/CodeGen/GlobalMergeFunctions.cpp 
b/llvm/lib/CodeGen/GlobalMergeFunctions.cpp
index 2b367ca87d9008..df8dbb8a73b95d 100644
--- a/llvm/lib/CodeGen/GlobalMergeFunctions.cpp
+++ b/llvm/lib/CodeGen/GlobalMergeFunctions.cpp
@@ -31,14 +31,6 @@ static cl::opt DisableCGDataForMerging(
  "merging is still enabled within a module."),
 cl::init(false));
 
-STATISTIC(NumMismatchedFunctionHash,
-  "Number of mismatched function hash for global merge function");
-STATISTIC(NumMismatchedInstCount,
-  "Number of mismatched instruction count for global merge function");
-STATISTIC(NumMismatchedConstHash,
-  "Number of mismatched const hash for global merge function");
-STATISTIC(NumMismatchedModuleId,
-  "Number of mismatched Module Id for global merge function");
 STATISTIC(NumMergedFunctions,
   "Number of functions that are actually merged using function hash");
 STATISTIC(NumAnalyzedModues, "Number of modules that are analyzed");
@@ -203,9 +195,9 @@ void GlobalMergeFunc::analyze(Module &M) {
 struct FuncMergeInfo {
   StableFunctionMap::StableFunctionEntry *SF;
   Function *F;
-  std::unique_ptr IndexInstruction;
+  IndexInstrMap *IndexInstruction;
   FuncMergeInfo(StableFunctionMap::StableFunctionEntry *SF, Function *F,
-std::unique_ptr IndexInstruction)
+IndexInstrMap *IndexInstruction)
   : SF(SF), F(F), IndexInstruction(std::move(IndexInstruction)) {}
 };
 
@@ -420,101 +412,79 @@ static ParamLocsVecTy computeParamInfo(
 bool GlobalMergeFunc::merge(Module &M, const StableFunctionMap *FunctionMap) {
   bool Changed = false;
 
-  // Build a map from stable function name to function.
-  StringMap StableNameToFuncMap;
-  for (auto &F : M)
-StableNameToFuncMap[get_stable_name(F.getName())] = &F;
-  // Track merged functions
-  DenseSet MergedFunctions;
-
-  auto ModId = M.getModuleIdentifier();
-  for (auto &[Hash, SFS] : FunctionMap->getFunctionMap()) {
-// Parameter locations based on the unique hash sequences
-// across the candidates.
+  // Collect stable functions related to the current module.
+  DenseMap> HashToFuncs;
+  DenseMap FuncToFI;
+  auto &Maps = FunctionMap->getFunctionMap();
+  for (auto &F : M) {
+if (!isEligibleFunction(&F))
+  continue;
+auto FI = llvm::StructuralHashWithDifferences(F, ignoreOp);
+if (Maps.contains(FI.FunctionHash)) {
+  HashToFuncs[FI.FunctionHash].push_back(&F);
+  FuncToFI.try_emplace(&F, std::move(FI));
+}
+  }
+
+  for (auto &[Hash, Funcs] : HashToFuncs) {
 std::optional ParamLocsVec;
-Function *MergedFunc = nullptr;
-std::string MergedModId;
 SmallVector FuncMergeInfos;
-for (auto &SF : SFS) {
-  // Get the function from the stable name.
-  auto I = StableNameToFuncMap.find(
-  *FunctionMap->getNameForId(SF->FunctionNameId));
-  if (I == StableNameToFuncMap.end())
-continue;
-  Function *F = I->second;
-  assert(F);
-  // Skip if the function has been merged before.
-  if (MergedFunctions.count(F))
-continue;
-  // Consider the function if it is eligible for merging.
-  if (!isEligibleFunction(F))
-continue;
 
-  auto FI = llvm::StructuralHashWithDifferences(*F, ignoreOp);
-  uint64_t FuncHash = FI.FunctionHash;
-  if (Hash != FuncHash) {
-++NumMismatchedFunctionHash;
-continue;
-  }
+// Iterate functions with the same hash.
+for (auto &F : Funcs) {
+  auto &SFS = Maps.at(Hash);
+  auto &FI = FuncToFI.at(F);
 
-  if (SF->InstCount != FI.IndexInstruction->size()) {
-++NumMismatchedInstCount;
+  // Check if the function is compatible with any stable function
+  // in terms of the number of instructions and ignored operands.
+  assert(!SFS.empty());
+  auto &RFS = SFS[0];
+  if (RFS->InstCount != FI.IndexInstruction->size())
 continue;
-  }
-  bool HasValidSharedConst = true;
-  for (auto &[Index, Hash] : *SF->IndexOperandHashMap) {
-auto [InstIndex, OpndIndex] = Index;
-assert(InstIndex < FI.IndexInstruction->size());
-auto *Inst = FI.IndexInstruction->lookup(InstIndex);
-if (!ignoreOp(Inst, OpndIndex)) {
-  HasValidSharedConst = false;
-  break;
-}
-  }
-  if (!HasValidSharedConst) {
-++NumMismatchedConstHash;
-continue;
-  }
-  if (!checkConstHashCompatible(*SF->IndexOperandHashMap,
-*FI.IndexOperandHashMap)) {
-  

[llvm-branch-commits] [llvm] [CGData] Refactor Global Merge Functions (PR #115750)

2024-11-12 Thread Kyungwoo Lee via llvm-branch-commits

kyulee-com wrote:

> Hit an assertion in `ignoreOp` when testing the refactored code.
> 
> ```
> Assertion failed: (OpIdx < I->getNumOperands() && "Invalid operand index"), 
> function ignoreOp, file GlobalMergeFunctions.cpp, line 129.
> Stop reason: hit program assert
> expr I->dump()
>   %6 = tail call ptr @objc_retain(ptr %5), !dbg !576
> 
> p I->getNumOperands()
> 
> (unsigned int) 2
> p OpIdx
> 
> (unsigned int) 4
> ```

Thank you for testing and identifying this bug!
Since we also use this function to verify any function that matches a hash, it 
should not assert.

https://github.com/llvm/llvm-project/pull/115750
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] test2 (PR #109137)

2024-09-18 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com created 
https://github.com/llvm/llvm-project/pull/109137

None

>From 32ae0b07276f7ccbdc5dd6675e0c46b507625449 Mon Sep 17 00:00:00 2001
From: Kyungwoo Lee 
Date: Wed, 18 Sep 2024 06:05:41 -0700
Subject: [PATCH] test2

---
 llvm/lib/LTO/LTO.cpp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/llvm/lib/LTO/LTO.cpp b/llvm/lib/LTO/LTO.cpp
index 9a01edd70e08c9..d33815ff704128 100644
--- a/llvm/lib/LTO/LTO.cpp
+++ b/llvm/lib/LTO/LTO.cpp
@@ -1371,6 +1371,7 @@ SmallVector 
LTO::getRuntimeLibcallSymbols(const Triple &TT) {
 
 /// This class defines the interface to the ThinLTO backend.
 /// Test
+/// Test2
 class lto::ThinBackendProc {
 protected:
   const Config &Conf;

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [CGData] Refactor Global Merge Functions (PR #115750)

2024-11-13 Thread Kyungwoo Lee via llvm-branch-commits

kyulee-com wrote:

> I can confirm that the performance have been improved significantly from my 
> testing on no-LTO projects that the slowdown is acceptable now. Before 
> applying the PR it was about 50% slowdown, now it is ~5%.

That's great to hear!
Since these PRs appear to be functioning, is it okay to merge them for now 
while we continue to discuss further improvements? Or do you have more comments 
to be addressed?

https://github.com/llvm/llvm-project/pull/115750
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [CodeGen] Limit number of analyzed predecessors (PR #142584)

2025-06-13 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/142584
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [CodeGen] Limit number of analyzed predecessors (PR #142584)

2025-06-04 Thread Kyungwoo Lee via llvm-branch-commits

kyulee-com wrote:

Adding this threshold check within `isTrellis()` feels somewhat unnatural. If 
compile time is a concern, could we simply check the size of functions (in 
terms of the number of blocks, as opposed to predecessor only) early in this 
pass and either skip it or switch to a faster, simpler algorithm? Also 1000 
size seems small, may be 1? 

https://github.com/llvm/llvm-project/pull/142584
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [CodeGen] Limit number of analyzed predecessors (PR #142584)

2025-06-06 Thread Kyungwoo Lee via llvm-branch-commits

kyulee-com wrote:

Can we add a LIT test case using this flag? I think you could set it with a 
smaller number to create a test case.

https://github.com/llvm/llvm-project/pull/142584
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits