[llvm-branch-commits] [llvm] release/19.x: Revert "[CGData] llvm-cgdata (#89884)" (PR #103886)
kyulee-com wrote: > > So we should remove this tool from the 19.x release? Can someone confirm? > > @kyulee-com @thevinster Are you two able to help confirm this? Yeah. I think we should remove this from the release as it was reverted. We plan to re-land it via https://github.com/llvm/llvm-project/pull/101461 once it gets approved. https://github.com/llvm/llvm-project/pull/103886 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: Revert "[CGData] llvm-cgdata (#89884)" (PR #103886)
https://github.com/kyulee-com approved this pull request. https://github.com/llvm/llvm-project/pull/103886 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] Thin11 (PR #111464)
https://github.com/kyulee-com created https://github.com/llvm/llvm-project/pull/111464 None >From 1249f0411388fb0832b49e80e7b6a0985822b026 Mon Sep 17 00:00:00 2001 From: Kyungwoo Lee Date: Fri, 13 Sep 2024 08:51:00 -0700 Subject: [PATCH 1/4] [CGData][ThinLTO] Global Outlining with Two-CodeGen Rounds --- llvm/include/llvm/CGData/CodeGenData.h| 16 +++ llvm/lib/CGData/CodeGenData.cpp | 81 +- llvm/lib/LTO/CMakeLists.txt | 1 + llvm/lib/LTO/LTO.cpp | 103 +- llvm/lib/LTO/LTOBackend.cpp | 11 ++ .../test/ThinLTO/AArch64/cgdata-two-rounds.ll | 94 llvm/test/ThinLTO/AArch64/lit.local.cfg | 2 + 7 files changed, 302 insertions(+), 6 deletions(-) create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-two-rounds.ll create mode 100644 llvm/test/ThinLTO/AArch64/lit.local.cfg diff --git a/llvm/include/llvm/CGData/CodeGenData.h b/llvm/include/llvm/CGData/CodeGenData.h index 84133a433170fe..1e1afe99327650 100644 --- a/llvm/include/llvm/CGData/CodeGenData.h +++ b/llvm/include/llvm/CGData/CodeGenData.h @@ -164,6 +164,22 @@ publishOutlinedHashTree(std::unique_ptr HashTree) { CodeGenData::getInstance().publishOutlinedHashTree(std::move(HashTree)); } +/// Initialize the two-codegen rounds. +void initializeTwoCodegenRounds(); + +/// Save the current module before the first codegen round. +void saveModuleForTwoRounds(const Module &TheModule, unsigned Task); + +/// Load the current module before the second codegen round. +std::unique_ptr loadModuleForTwoRounds(BitcodeModule &OrigModule, + unsigned Task, + LLVMContext &Context); + +/// Merge the codegen data from the input files in scratch vector in ThinLTO +/// two-codegen rounds. +Error mergeCodeGenData( +const std::unique_ptr>> InputFiles); + void warn(Error E, StringRef Whence = ""); void warn(Twine Message, std::string Whence = "", std::string Hint = ""); diff --git a/llvm/lib/CGData/CodeGenData.cpp b/llvm/lib/CGData/CodeGenData.cpp index 55d2504231c744..ff8e5dd7c75790 100644 --- a/llvm/lib/CGData/CodeGenData.cpp +++ b/llvm/lib/CGData/CodeGenData.cpp @@ -17,6 +17,7 @@ #include "llvm/Object/ObjectFile.h" #include "llvm/Support/CommandLine.h" #include "llvm/Support/FileSystem.h" +#include "llvm/Support/Path.h" #include "llvm/Support/WithColor.h" #define DEBUG_TYPE "cg-data" @@ -30,6 +31,14 @@ cl::opt cl::opt CodeGenDataUsePath("codegen-data-use-path", cl::init(""), cl::Hidden, cl::desc("File path to where .cgdata file is read")); +cl::opt CodeGenDataThinLTOTwoRounds( +"codegen-data-thinlto-two-rounds", cl::init(false), cl::Hidden, +cl::desc("Enable two-round ThinLTO code generation. The first round " + "emits codegen data, while the second round uses the emitted " + "codegen data for further optimizations.")); + +// Path to where the optimized bitcodes are saved and restored for ThinLTO. +static SmallString<128> CodeGenDataThinLTOTwoRoundsPath; static std::string getCGDataErrString(cgdata_error Err, const std::string &ErrMsg = "") { @@ -139,7 +148,7 @@ CodeGenData &CodeGenData::getInstance() { std::call_once(CodeGenData::OnceFlag, []() { Instance = std::unique_ptr(new CodeGenData()); -if (CodeGenDataGenerate) +if (CodeGenDataGenerate || CodeGenDataThinLTOTwoRounds) Instance->EmitCGData = true; else if (!CodeGenDataUsePath.empty()) { // Initialize the global CGData if the input file name is given. @@ -215,6 +224,76 @@ void warn(Error E, StringRef Whence) { } } +static std::string getPath(StringRef Dir, unsigned Task) { + return (Dir + "/" + llvm::Twine(Task) + ".saved_copy.bc").str(); +} + +void initializeTwoCodegenRounds() { + assert(CodeGenDataThinLTOTwoRounds); + if (auto EC = llvm::sys::fs::createUniqueDirectory( + "cgdata", CodeGenDataThinLTOTwoRoundsPath)) +report_fatal_error(Twine("Failed to create directory: ") + EC.message()); +} + +void saveModuleForTwoRounds(const Module &TheModule, unsigned Task) { + assert(sys::fs::is_directory(CodeGenDataThinLTOTwoRoundsPath)); + std::string Path = getPath(CodeGenDataThinLTOTwoRoundsPath, Task); + std::error_code EC; + raw_fd_ostream OS(Path, EC, sys::fs::OpenFlags::OF_None); + if (EC) +report_fatal_error(Twine("Failed to open ") + Path + + " to save optimized bitcode: " + EC.message()); + WriteBitcodeToFile(TheModule, OS, /* ShouldPreserveUseListOrder */ true); +} + +std::unique_ptr loadModuleForTwoRounds(BitcodeModule &OrigModule, + unsigned Task, + LLVMContext &Context) { + assert(sys::fs::is_directory(CodeGenDataThinLTOTwoRoundsPath)); + std::string Path = getPath(CodeGenDataThinLTO
[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)
https://github.com/kyulee-com ready_for_review https://github.com/llvm/llvm-project/pull/112638 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)
https://github.com/kyulee-com edited https://github.com/llvm/llvm-project/pull/112638 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)
https://github.com/kyulee-com updated https://github.com/llvm/llvm-project/pull/112638 >From 6225d74229d41068c57109a24b063f6fcba13985 Mon Sep 17 00:00:00 2001 From: Kyungwoo Lee Date: Wed, 16 Oct 2024 17:09:07 -0700 Subject: [PATCH 1/3] [StructuralHash] Support Differences This comutes a structural hash while allowing for selective ignoring of certain operands based on a custom function that is provided. Instead of a single hash value, it now returns FunctionHashInfo which includes a hash value, an instruction mapping, and a map to track the operand location and its corresponding hash value that is ignored. --- llvm/include/llvm/IR/StructuralHash.h| 46 ++ llvm/lib/IR/StructuralHash.cpp | 188 +-- llvm/unittests/IR/StructuralHashTest.cpp | 55 +++ 3 files changed, 275 insertions(+), 14 deletions(-) diff --git a/llvm/include/llvm/IR/StructuralHash.h b/llvm/include/llvm/IR/StructuralHash.h index aa292bc3446799..bc82c204c4d1f6 100644 --- a/llvm/include/llvm/IR/StructuralHash.h +++ b/llvm/include/llvm/IR/StructuralHash.h @@ -14,7 +14,9 @@ #ifndef LLVM_IR_STRUCTURALHASH_H #define LLVM_IR_STRUCTURALHASH_H +#include "llvm/ADT/MapVector.h" #include "llvm/ADT/StableHashing.h" +#include "llvm/IR/Instruction.h" #include namespace llvm { @@ -23,6 +25,7 @@ class Function; class Module; using IRHash = stable_hash; +using OpndHash = stable_hash; /// Returns a hash of the function \p F. /// \param F The function to hash. @@ -37,6 +40,49 @@ IRHash StructuralHash(const Function &F, bool DetailedHash = false); /// composed the module hash. IRHash StructuralHash(const Module &M, bool DetailedHash = false); +/// The pair of an instruction index and a operand index. +using IndexPair = std::pair; + +/// A map from an instruction index to an instruction pointer. +using IndexInstrMap = MapVector; + +/// A map from an IndexPair to an OpndHash. +using IndexOperandHashMapType = DenseMap; + +/// A function that takes an instruction and an operand index and returns true +/// if the operand should be ignored in the function hash computation. +using IgnoreOperandFunc = std::function; + +struct FunctionHashInfo { + /// A hash value representing the structural content of the function + IRHash FunctionHash; + /// A mapping from instruction indices to instruction pointers + std::unique_ptr IndexInstruction; + /// A mapping from pairs of instruction indices and operand indices + /// to the hashes of the operands. This can be used to analyze or + /// reconstruct the differences in ignored operands + std::unique_ptr IndexOperandHashMap; + + FunctionHashInfo(IRHash FuntionHash, + std::unique_ptr IndexInstruction, + std::unique_ptr IndexOperandHashMap) + : FunctionHash(FuntionHash), +IndexInstruction(std::move(IndexInstruction)), +IndexOperandHashMap(std::move(IndexOperandHashMap)) {} +}; + +/// Computes a structural hash of a given function, considering the structure +/// and content of the function's instructions while allowing for selective +/// ignoring of certain operands based on custom criteria. This hash can be used +/// to identify functions that are structurally similar or identical, which is +/// useful in optimizations, deduplication, or analysis tasks. +/// \param F The function to hash. +/// \param IgnoreOp A callable that takes an instruction and an operand index, +/// and returns true if the operand should be ignored in the hash computation. +/// \return A FunctionHashInfo structure +FunctionHashInfo StructuralHashWithDifferences(const Function &F, + IgnoreOperandFunc IgnoreOp); + } // end namespace llvm #endif diff --git a/llvm/lib/IR/StructuralHash.cpp b/llvm/lib/IR/StructuralHash.cpp index a1fabab77d52b2..6e0af666010a05 100644 --- a/llvm/lib/IR/StructuralHash.cpp +++ b/llvm/lib/IR/StructuralHash.cpp @@ -28,6 +28,19 @@ class StructuralHashImpl { bool DetailedHash; + /// IgnoreOp is a function that returns true if the operand should be ignored. + IgnoreOperandFunc IgnoreOp = nullptr; + /// A mapping from instruction indices to instruction pointers. + /// The index represents the position of an instruction based on the order in + /// which it is first encountered. + std::unique_ptr IndexInstruction = nullptr; + /// A mapping from pairs of instruction indices and operand indices + /// to the hashes of the operands. + std::unique_ptr IndexOperandHashMap = nullptr; + + /// Assign a unique ID to each Value in the order they are first seen. + DenseMap ValueToId; + // This will produce different values on 32-bit and 64-bit systens as // hash_combine returns a size_t. However, this is only used for // detailed hashing which, in-tree, only needs to distinguish between @@ -47,24 +60,140 @@ class StructuralHashImpl { public: StructuralHashImpl() = delete; - explicit StructuralHashImpl(bool DetailedHash) : DetailedHas
[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)
@@ -47,24 +60,140 @@ class StructuralHashImpl { public: StructuralHashImpl() = delete; - explicit StructuralHashImpl(bool DetailedHash) : DetailedHash(DetailedHash) {} + explicit StructuralHashImpl(bool DetailedHash, + IgnoreOperandFunc IgnoreOp = nullptr) + : DetailedHash(DetailedHash), IgnoreOp(IgnoreOp) { +if (IgnoreOp) { + IndexInstruction = std::make_unique(); + IndexOperandHashMap = std::make_unique(); +} + } - stable_hash hashConstant(Constant *C) { + stable_hash hashAPInt(const APInt &I) { SmallVector Hashes; -// TODO: hashArbitaryType() is not stable. -if (ConstantInt *ConstInt = dyn_cast(C)) { - Hashes.emplace_back(hashArbitaryType(ConstInt->getValue())); -} else if (ConstantFP *ConstFP = dyn_cast(C)) { - Hashes.emplace_back(hashArbitaryType(ConstFP->getValue())); -} else if (Function *Func = dyn_cast(C)) - // Hashing the name will be deterministic as LLVM's hashing infrastructure - // has explicit support for hashing strings and will not simply hash - // the pointer. - Hashes.emplace_back(hashArbitaryType(Func->getName())); +Hashes.emplace_back(I.getBitWidth()); +for (unsigned J = 0; J < I.getNumWords(); ++J) + Hashes.emplace_back((I.getRawData())[J]); +return stable_hash_combine(Hashes); + } + stable_hash hashAPFloat(const APFloat &F) { +SmallVector Hashes; +const fltSemantics &S = F.getSemantics(); +Hashes.emplace_back(APFloat::semanticsPrecision(S)); +Hashes.emplace_back(APFloat::semanticsMaxExponent(S)); +Hashes.emplace_back(APFloat::semanticsMinExponent(S)); +Hashes.emplace_back(APFloat::semanticsSizeInBits(S)); +Hashes.emplace_back(hashAPInt(F.bitcastToAPInt())); return stable_hash_combine(Hashes); } + stable_hash hashGlobalValue(const GlobalValue *GV) { +if (!GV->hasName()) + return 0; +return stable_hash_name(GV->getName()); + } + + // Compute a hash for a Constant. This function is logically similar to + // FunctionComparator::cmpConstants() in FunctionComparator.cpp, but here + // we're interested in computing a hash rather than comparing two Constants. + // Some of the logic is simplified, e.g, we don't expand GEPOperator. + stable_hash hashConstant(Constant *C) { +SmallVector Hashes; + +Type *Ty = C->getType(); +Hashes.emplace_back(hashType(Ty)); + +if (C->isNullValue()) { + Hashes.emplace_back(static_cast('N')); + return stable_hash_combine(Hashes); +} + +auto *G = dyn_cast(C); +if (G) { + Hashes.emplace_back(hashGlobalValue(G)); + return stable_hash_combine(Hashes); +} + +if (const auto *Seq = dyn_cast(C)) { + Hashes.emplace_back(xxh3_64bits(Seq->getRawDataValues())); + return stable_hash_combine(Hashes); +} + +switch (C->getValueID()) { +case Value::UndefValueVal: +case Value::PoisonValueVal: +case Value::ConstantTokenNoneVal: { + return stable_hash_combine(Hashes); +} +case Value::ConstantIntVal: { + const APInt &Int = cast(C)->getValue(); + Hashes.emplace_back(hashAPInt(Int)); + return stable_hash_combine(Hashes); +} +case Value::ConstantFPVal: { + const APFloat &APF = cast(C)->getValueAPF(); + Hashes.emplace_back(hashAPFloat(APF)); + return stable_hash_combine(Hashes); +} +case Value::ConstantArrayVal: { + const ConstantArray *A = cast(C); + uint64_t NumElements = cast(Ty)->getNumElements(); + Hashes.emplace_back(NumElements); kyulee-com wrote: Yeah. We could remove the count. https://github.com/llvm/llvm-project/pull/112638 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)
@@ -47,24 +60,140 @@ class StructuralHashImpl { public: StructuralHashImpl() = delete; - explicit StructuralHashImpl(bool DetailedHash) : DetailedHash(DetailedHash) {} + explicit StructuralHashImpl(bool DetailedHash, + IgnoreOperandFunc IgnoreOp = nullptr) + : DetailedHash(DetailedHash), IgnoreOp(IgnoreOp) { +if (IgnoreOp) { + IndexInstruction = std::make_unique(); + IndexOperandHashMap = std::make_unique(); +} + } - stable_hash hashConstant(Constant *C) { + stable_hash hashAPInt(const APInt &I) { SmallVector Hashes; -// TODO: hashArbitaryType() is not stable. -if (ConstantInt *ConstInt = dyn_cast(C)) { - Hashes.emplace_back(hashArbitaryType(ConstInt->getValue())); -} else if (ConstantFP *ConstFP = dyn_cast(C)) { - Hashes.emplace_back(hashArbitaryType(ConstFP->getValue())); -} else if (Function *Func = dyn_cast(C)) - // Hashing the name will be deterministic as LLVM's hashing infrastructure - // has explicit support for hashing strings and will not simply hash - // the pointer. - Hashes.emplace_back(hashArbitaryType(Func->getName())); +Hashes.emplace_back(I.getBitWidth()); +for (unsigned J = 0; J < I.getNumWords(); ++J) + Hashes.emplace_back((I.getRawData())[J]); +return stable_hash_combine(Hashes); + } + stable_hash hashAPFloat(const APFloat &F) { +SmallVector Hashes; +const fltSemantics &S = F.getSemantics(); +Hashes.emplace_back(APFloat::semanticsPrecision(S)); +Hashes.emplace_back(APFloat::semanticsMaxExponent(S)); +Hashes.emplace_back(APFloat::semanticsMinExponent(S)); +Hashes.emplace_back(APFloat::semanticsSizeInBits(S)); +Hashes.emplace_back(hashAPInt(F.bitcastToAPInt())); return stable_hash_combine(Hashes); } + stable_hash hashGlobalValue(const GlobalValue *GV) { +if (!GV->hasName()) + return 0; +return stable_hash_name(GV->getName()); kyulee-com wrote: `stable_hash_name` itself already handles it by calling `get_stable_name`. https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ADT/StableHashing.h#L55-L74 https://github.com/llvm/llvm-project/pull/112638 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)
@@ -47,24 +60,140 @@ class StructuralHashImpl { public: StructuralHashImpl() = delete; - explicit StructuralHashImpl(bool DetailedHash) : DetailedHash(DetailedHash) {} + explicit StructuralHashImpl(bool DetailedHash, + IgnoreOperandFunc IgnoreOp = nullptr) + : DetailedHash(DetailedHash), IgnoreOp(IgnoreOp) { +if (IgnoreOp) { + IndexInstruction = std::make_unique(); + IndexOperandHashMap = std::make_unique(); +} + } - stable_hash hashConstant(Constant *C) { + stable_hash hashAPInt(const APInt &I) { SmallVector Hashes; -// TODO: hashArbitaryType() is not stable. -if (ConstantInt *ConstInt = dyn_cast(C)) { - Hashes.emplace_back(hashArbitaryType(ConstInt->getValue())); -} else if (ConstantFP *ConstFP = dyn_cast(C)) { - Hashes.emplace_back(hashArbitaryType(ConstFP->getValue())); -} else if (Function *Func = dyn_cast(C)) - // Hashing the name will be deterministic as LLVM's hashing infrastructure - // has explicit support for hashing strings and will not simply hash - // the pointer. - Hashes.emplace_back(hashArbitaryType(Func->getName())); +Hashes.emplace_back(I.getBitWidth()); +for (unsigned J = 0; J < I.getNumWords(); ++J) + Hashes.emplace_back((I.getRawData())[J]); +return stable_hash_combine(Hashes); + } + stable_hash hashAPFloat(const APFloat &F) { +SmallVector Hashes; +const fltSemantics &S = F.getSemantics(); +Hashes.emplace_back(APFloat::semanticsPrecision(S)); +Hashes.emplace_back(APFloat::semanticsMaxExponent(S)); +Hashes.emplace_back(APFloat::semanticsMinExponent(S)); +Hashes.emplace_back(APFloat::semanticsSizeInBits(S)); +Hashes.emplace_back(hashAPInt(F.bitcastToAPInt())); return stable_hash_combine(Hashes); } + stable_hash hashGlobalValue(const GlobalValue *GV) { +if (!GV->hasName()) + return 0; +return stable_hash_name(GV->getName()); + } + + // Compute a hash for a Constant. This function is logically similar to + // FunctionComparator::cmpConstants() in FunctionComparator.cpp, but here + // we're interested in computing a hash rather than comparing two Constants. + // Some of the logic is simplified, e.g, we don't expand GEPOperator. + stable_hash hashConstant(Constant *C) { +SmallVector Hashes; + +Type *Ty = C->getType(); +Hashes.emplace_back(hashType(Ty)); + +if (C->isNullValue()) { + Hashes.emplace_back(static_cast('N')); + return stable_hash_combine(Hashes); +} + +auto *G = dyn_cast(C); +if (G) { + Hashes.emplace_back(hashGlobalValue(G)); + return stable_hash_combine(Hashes); +} + +if (const auto *Seq = dyn_cast(C)) { + Hashes.emplace_back(xxh3_64bits(Seq->getRawDataValues())); + return stable_hash_combine(Hashes); +} + +switch (C->getValueID()) { +case Value::UndefValueVal: +case Value::PoisonValueVal: +case Value::ConstantTokenNoneVal: { + return stable_hash_combine(Hashes); +} +case Value::ConstantIntVal: { + const APInt &Int = cast(C)->getValue(); + Hashes.emplace_back(hashAPInt(Int)); + return stable_hash_combine(Hashes); +} +case Value::ConstantFPVal: { + const APFloat &APF = cast(C)->getValueAPF(); + Hashes.emplace_back(hashAPFloat(APF)); + return stable_hash_combine(Hashes); +} +case Value::ConstantArrayVal: { + const ConstantArray *A = cast(C); + uint64_t NumElements = cast(Ty)->getNumElements(); + Hashes.emplace_back(NumElements); + for (auto &Op : A->operands()) { +auto H = hashConstant(cast(Op)); +Hashes.emplace_back(H); + } + return stable_hash_combine(Hashes); +} +case Value::ConstantStructVal: { + const ConstantStruct *S = cast(C); + unsigned NumElements = cast(Ty)->getNumElements(); + Hashes.emplace_back(NumElements); + for (auto &Op : S->operands()) { +auto H = hashConstant(cast(Op)); +Hashes.emplace_back(H); + } + return stable_hash_combine(Hashes); +} +case Value::ConstantVectorVal: { + const ConstantVector *V = cast(C); + unsigned NumElements = cast(Ty)->getNumElements(); + Hashes.emplace_back(NumElements); + for (auto &Op : V->operands()) { +auto H = hashConstant(cast(Op)); +Hashes.emplace_back(H); + } + return stable_hash_combine(Hashes); +} +case Value::ConstantExprVal: { + const ConstantExpr *E = cast(C); + unsigned NumOperands = E->getNumOperands(); + Hashes.emplace_back(NumOperands); + for (auto &Op : E->operands()) { +auto H = hashConstant(cast(Op)); +Hashes.emplace_back(H); + } + return stable_hash_combine(Hashes); +} +case Value::BlockAddressVal: { + const BlockAddress *BA = cast(C); + auto H = hashGlobalValue(BA->getFunction()); + Hashes.emplace_back(H); + return stable_hash_co
[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)
kyulee-com wrote: > IIRC we have several lit tests that cover structural hash, shouldn't we have > a new test there that uses the new functionality? Extended the existing `StructuralHashPrinterPass` with `Options`, and updated the corresponding lit test accordingly. https://github.com/llvm/llvm-project/pull/112638 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)
@@ -47,24 +60,140 @@ class StructuralHashImpl { public: StructuralHashImpl() = delete; - explicit StructuralHashImpl(bool DetailedHash) : DetailedHash(DetailedHash) {} + explicit StructuralHashImpl(bool DetailedHash, + IgnoreOperandFunc IgnoreOp = nullptr) + : DetailedHash(DetailedHash), IgnoreOp(IgnoreOp) { +if (IgnoreOp) { + IndexInstruction = std::make_unique(); + IndexOperandHashMap = std::make_unique(); +} + } - stable_hash hashConstant(Constant *C) { + stable_hash hashAPInt(const APInt &I) { SmallVector Hashes; -// TODO: hashArbitaryType() is not stable. -if (ConstantInt *ConstInt = dyn_cast(C)) { - Hashes.emplace_back(hashArbitaryType(ConstInt->getValue())); -} else if (ConstantFP *ConstFP = dyn_cast(C)) { - Hashes.emplace_back(hashArbitaryType(ConstFP->getValue())); -} else if (Function *Func = dyn_cast(C)) - // Hashing the name will be deterministic as LLVM's hashing infrastructure - // has explicit support for hashing strings and will not simply hash - // the pointer. - Hashes.emplace_back(hashArbitaryType(Func->getName())); +Hashes.emplace_back(I.getBitWidth()); +for (unsigned J = 0; J < I.getNumWords(); ++J) + Hashes.emplace_back((I.getRawData())[J]); +return stable_hash_combine(Hashes); + } + stable_hash hashAPFloat(const APFloat &F) { +SmallVector Hashes; +const fltSemantics &S = F.getSemantics(); +Hashes.emplace_back(APFloat::semanticsPrecision(S)); +Hashes.emplace_back(APFloat::semanticsMaxExponent(S)); +Hashes.emplace_back(APFloat::semanticsMinExponent(S)); +Hashes.emplace_back(APFloat::semanticsSizeInBits(S)); +Hashes.emplace_back(hashAPInt(F.bitcastToAPInt())); kyulee-com wrote: yeah. we could simplify it. https://github.com/llvm/llvm-project/pull/112638 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)
@@ -47,24 +60,140 @@ class StructuralHashImpl { public: StructuralHashImpl() = delete; - explicit StructuralHashImpl(bool DetailedHash) : DetailedHash(DetailedHash) {} + explicit StructuralHashImpl(bool DetailedHash, + IgnoreOperandFunc IgnoreOp = nullptr) + : DetailedHash(DetailedHash), IgnoreOp(IgnoreOp) { +if (IgnoreOp) { + IndexInstruction = std::make_unique(); + IndexOperandHashMap = std::make_unique(); +} + } - stable_hash hashConstant(Constant *C) { + stable_hash hashAPInt(const APInt &I) { SmallVector Hashes; -// TODO: hashArbitaryType() is not stable. -if (ConstantInt *ConstInt = dyn_cast(C)) { - Hashes.emplace_back(hashArbitaryType(ConstInt->getValue())); -} else if (ConstantFP *ConstFP = dyn_cast(C)) { - Hashes.emplace_back(hashArbitaryType(ConstFP->getValue())); -} else if (Function *Func = dyn_cast(C)) - // Hashing the name will be deterministic as LLVM's hashing infrastructure - // has explicit support for hashing strings and will not simply hash - // the pointer. - Hashes.emplace_back(hashArbitaryType(Func->getName())); +Hashes.emplace_back(I.getBitWidth()); +for (unsigned J = 0; J < I.getNumWords(); ++J) + Hashes.emplace_back((I.getRawData())[J]); +return stable_hash_combine(Hashes); + } + stable_hash hashAPFloat(const APFloat &F) { +SmallVector Hashes; +const fltSemantics &S = F.getSemantics(); +Hashes.emplace_back(APFloat::semanticsPrecision(S)); +Hashes.emplace_back(APFloat::semanticsMaxExponent(S)); +Hashes.emplace_back(APFloat::semanticsMinExponent(S)); +Hashes.emplace_back(APFloat::semanticsSizeInBits(S)); +Hashes.emplace_back(hashAPInt(F.bitcastToAPInt())); return stable_hash_combine(Hashes); } + stable_hash hashGlobalValue(const GlobalValue *GV) { +if (!GV->hasName()) + return 0; +return stable_hash_name(GV->getName()); + } + + // Compute a hash for a Constant. This function is logically similar to + // FunctionComparator::cmpConstants() in FunctionComparator.cpp, but here + // we're interested in computing a hash rather than comparing two Constants. + // Some of the logic is simplified, e.g, we don't expand GEPOperator. + stable_hash hashConstant(Constant *C) { +SmallVector Hashes; + +Type *Ty = C->getType(); +Hashes.emplace_back(hashType(Ty)); + +if (C->isNullValue()) { + Hashes.emplace_back(static_cast('N')); + return stable_hash_combine(Hashes); +} + +auto *G = dyn_cast(C); +if (G) { + Hashes.emplace_back(hashGlobalValue(G)); + return stable_hash_combine(Hashes); +} + +if (const auto *Seq = dyn_cast(C)) { + Hashes.emplace_back(xxh3_64bits(Seq->getRawDataValues())); + return stable_hash_combine(Hashes); +} + +switch (C->getValueID()) { +case Value::UndefValueVal: +case Value::PoisonValueVal: +case Value::ConstantTokenNoneVal: { + return stable_hash_combine(Hashes); +} +case Value::ConstantIntVal: { + const APInt &Int = cast(C)->getValue(); + Hashes.emplace_back(hashAPInt(Int)); + return stable_hash_combine(Hashes); +} +case Value::ConstantFPVal: { + const APFloat &APF = cast(C)->getValueAPF(); + Hashes.emplace_back(hashAPFloat(APF)); + return stable_hash_combine(Hashes); +} +case Value::ConstantArrayVal: { + const ConstantArray *A = cast(C); + uint64_t NumElements = cast(Ty)->getNumElements(); + Hashes.emplace_back(NumElements); + for (auto &Op : A->operands()) { +auto H = hashConstant(cast(Op)); +Hashes.emplace_back(H); + } + return stable_hash_combine(Hashes); +} +case Value::ConstantStructVal: { + const ConstantStruct *S = cast(C); + unsigned NumElements = cast(Ty)->getNumElements(); + Hashes.emplace_back(NumElements); + for (auto &Op : S->operands()) { +auto H = hashConstant(cast(Op)); +Hashes.emplace_back(H); + } + return stable_hash_combine(Hashes); kyulee-com wrote: Most cases are simply grouped. https://github.com/llvm/llvm-project/pull/112638 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)
@@ -100,8 +233,20 @@ class StructuralHashImpl { if (const auto *ComparisonInstruction = dyn_cast(&Inst)) Hashes.emplace_back(ComparisonInstruction->getPredicate()); -for (const auto &Op : Inst.operands()) - Hashes.emplace_back(hashOperand(Op)); +unsigned InstIdx = 0; +if (IndexInstruction) { + InstIdx = IndexInstruction->size(); + IndexInstruction->insert({InstIdx, const_cast(&Inst)}); kyulee-com wrote: Instruction is inserted once by design in this pass. In fact, this map `IndexInstruction` itself can't catch the duplication as the key is `index`, not `Instruction *`. Anyhow, replaced `insert` by `trace_emplace` for efficiency. https://github.com/llvm/llvm-project/pull/112638 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)
kyulee-com wrote: The test failure `TableGen/x86-fold-tables.td` seems unrelated. https://github.com/llvm/llvm-project/pull/112638 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [CGData] Global Merge Functions (PR #112671)
https://github.com/kyulee-com updated https://github.com/llvm/llvm-project/pull/112671 >From ded5771bb4ff7c8fd5401b4efe0af988539a8162 Mon Sep 17 00:00:00 2001 From: Kyungwoo Lee Date: Fri, 30 Aug 2024 00:09:09 -0700 Subject: [PATCH 1/2] [CGData] Global Merge Functions --- llvm/include/llvm/CGData/CodeGenData.h| 11 + llvm/include/llvm/InitializePasses.h | 1 + llvm/include/llvm/LinkAllPasses.h | 1 + llvm/include/llvm/Passes/CodeGenPassBuilder.h | 1 + llvm/include/llvm/Transforms/IPO.h| 2 + .../Transforms/IPO/GlobalMergeFunctions.h | 77 ++ llvm/lib/CodeGen/TargetPassConfig.cpp | 3 + llvm/lib/LTO/LTO.cpp | 1 + llvm/lib/Transforms/IPO/CMakeLists.txt| 2 + .../Transforms/IPO/GlobalMergeFunctions.cpp | 687 ++ .../ThinLTO/AArch64/cgdata-merge-local.ll | 62 ++ .../test/ThinLTO/AArch64/cgdata-merge-read.ll | 82 +++ .../AArch64/cgdata-merge-two-rounds.ll| 68 ++ .../ThinLTO/AArch64/cgdata-merge-write.ll | 97 +++ llvm/tools/llvm-lto2/CMakeLists.txt | 1 + llvm/tools/llvm-lto2/llvm-lto2.cpp| 6 + 16 files changed, 1102 insertions(+) create mode 100644 llvm/include/llvm/Transforms/IPO/GlobalMergeFunctions.h create mode 100644 llvm/lib/Transforms/IPO/GlobalMergeFunctions.cpp create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-local.ll create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-read.ll create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-two-rounds.ll create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-write.ll diff --git a/llvm/include/llvm/CGData/CodeGenData.h b/llvm/include/llvm/CGData/CodeGenData.h index 5d7c74725ccef1..da0e412f2a0e03 100644 --- a/llvm/include/llvm/CGData/CodeGenData.h +++ b/llvm/include/llvm/CGData/CodeGenData.h @@ -145,6 +145,9 @@ class CodeGenData { const OutlinedHashTree *getOutlinedHashTree() { return PublishedHashTree.get(); } + const StableFunctionMap *getStableFunctionMap() { +return PublishedStableFunctionMap.get(); + } /// Returns true if we should write codegen data. bool emitCGData() { return EmitCGData; } @@ -169,10 +172,18 @@ inline bool hasOutlinedHashTree() { return CodeGenData::getInstance().hasOutlinedHashTree(); } +inline bool hasStableFunctionMap() { + return CodeGenData::getInstance().hasStableFunctionMap(); +} + inline const OutlinedHashTree *getOutlinedHashTree() { return CodeGenData::getInstance().getOutlinedHashTree(); } +inline const StableFunctionMap *getStableFunctionMap() { + return CodeGenData::getInstance().getStableFunctionMap(); +} + inline bool emitCGData() { return CodeGenData::getInstance().emitCGData(); } inline void diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h index 4352099d6dbb99..9aa36d5bb7f801 100644 --- a/llvm/include/llvm/InitializePasses.h +++ b/llvm/include/llvm/InitializePasses.h @@ -123,6 +123,7 @@ void initializeGCEmptyBasicBlocksPass(PassRegistry &); void initializeGCMachineCodeAnalysisPass(PassRegistry &); void initializeGCModuleInfoPass(PassRegistry &); void initializeGVNLegacyPassPass(PassRegistry &); +void initializeGlobalMergeFuncPass(PassRegistry &); void initializeGlobalMergePass(PassRegistry &); void initializeGlobalsAAWrapperPassPass(PassRegistry &); void initializeHardwareLoopsLegacyPass(PassRegistry &); diff --git a/llvm/include/llvm/LinkAllPasses.h b/llvm/include/llvm/LinkAllPasses.h index 92b59a66567c95..ea3609a2b4bc71 100644 --- a/llvm/include/llvm/LinkAllPasses.h +++ b/llvm/include/llvm/LinkAllPasses.h @@ -79,6 +79,7 @@ struct ForcePassLinking { (void)llvm::createDomOnlyViewerWrapperPassPass(); (void)llvm::createDomViewerWrapperPassPass(); (void)llvm::createAlwaysInlinerLegacyPass(); +(void)llvm::createGlobalMergeFuncPass(); (void)llvm::createGlobalsAAWrapperPass(); (void)llvm::createInstSimplifyLegacyPass(); (void)llvm::createInstructionCombiningPass(); diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h b/llvm/include/llvm/Passes/CodeGenPassBuilder.h index 13bc4700d87029..96b5b815132bc0 100644 --- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h +++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h @@ -74,6 +74,7 @@ #include "llvm/Target/CGPassBuilderOption.h" #include "llvm/Target/TargetMachine.h" #include "llvm/Transforms/CFGuard.h" +#include "llvm/Transforms/IPO/GlobalMergeFunctions.h" #include "llvm/Transforms/Scalar/ConstantHoisting.h" #include "llvm/Transforms/Scalar/LoopPassManager.h" #include "llvm/Transforms/Scalar/LoopStrengthReduce.h" diff --git a/llvm/include/llvm/Transforms/IPO.h b/llvm/include/llvm/Transforms/IPO.h index ee0e35aa618325..86a8654f56997c 100644 --- a/llvm/include/llvm/Transforms/IPO.h +++ b/llvm/include/llvm/Transforms/IPO.h @@ -55,6 +55,8 @@ enum class PassSummaryAction { Export, ///< Export information to summary. }; +Pass *createGlobalMergeF
[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)
https://github.com/kyulee-com updated https://github.com/llvm/llvm-project/pull/112638 >From 6225d74229d41068c57109a24b063f6fcba13985 Mon Sep 17 00:00:00 2001 From: Kyungwoo Lee Date: Wed, 16 Oct 2024 17:09:07 -0700 Subject: [PATCH 1/4] [StructuralHash] Support Differences This comutes a structural hash while allowing for selective ignoring of certain operands based on a custom function that is provided. Instead of a single hash value, it now returns FunctionHashInfo which includes a hash value, an instruction mapping, and a map to track the operand location and its corresponding hash value that is ignored. --- llvm/include/llvm/IR/StructuralHash.h| 46 ++ llvm/lib/IR/StructuralHash.cpp | 188 +-- llvm/unittests/IR/StructuralHashTest.cpp | 55 +++ 3 files changed, 275 insertions(+), 14 deletions(-) diff --git a/llvm/include/llvm/IR/StructuralHash.h b/llvm/include/llvm/IR/StructuralHash.h index aa292bc3446799..bc82c204c4d1f6 100644 --- a/llvm/include/llvm/IR/StructuralHash.h +++ b/llvm/include/llvm/IR/StructuralHash.h @@ -14,7 +14,9 @@ #ifndef LLVM_IR_STRUCTURALHASH_H #define LLVM_IR_STRUCTURALHASH_H +#include "llvm/ADT/MapVector.h" #include "llvm/ADT/StableHashing.h" +#include "llvm/IR/Instruction.h" #include namespace llvm { @@ -23,6 +25,7 @@ class Function; class Module; using IRHash = stable_hash; +using OpndHash = stable_hash; /// Returns a hash of the function \p F. /// \param F The function to hash. @@ -37,6 +40,49 @@ IRHash StructuralHash(const Function &F, bool DetailedHash = false); /// composed the module hash. IRHash StructuralHash(const Module &M, bool DetailedHash = false); +/// The pair of an instruction index and a operand index. +using IndexPair = std::pair; + +/// A map from an instruction index to an instruction pointer. +using IndexInstrMap = MapVector; + +/// A map from an IndexPair to an OpndHash. +using IndexOperandHashMapType = DenseMap; + +/// A function that takes an instruction and an operand index and returns true +/// if the operand should be ignored in the function hash computation. +using IgnoreOperandFunc = std::function; + +struct FunctionHashInfo { + /// A hash value representing the structural content of the function + IRHash FunctionHash; + /// A mapping from instruction indices to instruction pointers + std::unique_ptr IndexInstruction; + /// A mapping from pairs of instruction indices and operand indices + /// to the hashes of the operands. This can be used to analyze or + /// reconstruct the differences in ignored operands + std::unique_ptr IndexOperandHashMap; + + FunctionHashInfo(IRHash FuntionHash, + std::unique_ptr IndexInstruction, + std::unique_ptr IndexOperandHashMap) + : FunctionHash(FuntionHash), +IndexInstruction(std::move(IndexInstruction)), +IndexOperandHashMap(std::move(IndexOperandHashMap)) {} +}; + +/// Computes a structural hash of a given function, considering the structure +/// and content of the function's instructions while allowing for selective +/// ignoring of certain operands based on custom criteria. This hash can be used +/// to identify functions that are structurally similar or identical, which is +/// useful in optimizations, deduplication, or analysis tasks. +/// \param F The function to hash. +/// \param IgnoreOp A callable that takes an instruction and an operand index, +/// and returns true if the operand should be ignored in the hash computation. +/// \return A FunctionHashInfo structure +FunctionHashInfo StructuralHashWithDifferences(const Function &F, + IgnoreOperandFunc IgnoreOp); + } // end namespace llvm #endif diff --git a/llvm/lib/IR/StructuralHash.cpp b/llvm/lib/IR/StructuralHash.cpp index a1fabab77d52b2..6e0af666010a05 100644 --- a/llvm/lib/IR/StructuralHash.cpp +++ b/llvm/lib/IR/StructuralHash.cpp @@ -28,6 +28,19 @@ class StructuralHashImpl { bool DetailedHash; + /// IgnoreOp is a function that returns true if the operand should be ignored. + IgnoreOperandFunc IgnoreOp = nullptr; + /// A mapping from instruction indices to instruction pointers. + /// The index represents the position of an instruction based on the order in + /// which it is first encountered. + std::unique_ptr IndexInstruction = nullptr; + /// A mapping from pairs of instruction indices and operand indices + /// to the hashes of the operands. + std::unique_ptr IndexOperandHashMap = nullptr; + + /// Assign a unique ID to each Value in the order they are first seen. + DenseMap ValueToId; + // This will produce different values on 32-bit and 64-bit systens as // hash_combine returns a size_t. However, this is only used for // detailed hashing which, in-tree, only needs to distinguish between @@ -47,24 +60,140 @@ class StructuralHashImpl { public: StructuralHashImpl() = delete; - explicit StructuralHashImpl(bool DetailedHash) : DetailedHas
[llvm-branch-commits] [llvm] [CGData][llvm-cgdata] Support for stable function map (PR #112664)
https://github.com/kyulee-com updated https://github.com/llvm/llvm-project/pull/112664 >From 3b73ee558d57434ee1f8447ac2509db371d95d8f Mon Sep 17 00:00:00 2001 From: Kyungwoo Lee Date: Mon, 9 Sep 2024 19:38:05 -0700 Subject: [PATCH] [CGData][llvm-cgdata] Support for stable function map This introduces a new cgdata format for stable function maps. The raw data is embedded in the __llvm_merge section during compile time. This data can be read and merged using the llvm-cgdata tool, into an indexed cgdata file. Consequently, the tool is now capable of handling either outlined hash trees, stable function maps, or both, as they are orthogonal. --- llvm/docs/CommandGuide/llvm-cgdata.rst| 16 ++-- llvm/include/llvm/CGData/CodeGenData.h| 24 +- llvm/include/llvm/CGData/CodeGenData.inc | 12 ++- llvm/include/llvm/CGData/CodeGenDataReader.h | 29 ++- llvm/include/llvm/CGData/CodeGenDataWriter.h | 17 +++- llvm/lib/CGData/CodeGenData.cpp | 30 --- llvm/lib/CGData/CodeGenDataReader.cpp | 63 +- llvm/lib/CGData/CodeGenDataWriter.cpp | 30 ++- llvm/test/tools/llvm-cgdata/empty.test| 8 +- llvm/test/tools/llvm-cgdata/error.test| 13 +-- .../merge-combined-funcmap-hashtree.test | 66 +++ .../llvm-cgdata/merge-funcmap-archive.test| 83 +++ .../llvm-cgdata/merge-funcmap-concat.test | 78 + .../llvm-cgdata/merge-funcmap-double.test | 79 ++ .../llvm-cgdata/merge-funcmap-single.test | 36 ...chive.test => merge-hashtree-archive.test} | 8 +- ...concat.test => merge-hashtree-concat.test} | 6 +- ...double.test => merge-hashtree-double.test} | 8 +- ...single.test => merge-hashtree-single.test} | 4 +- llvm/tools/llvm-cgdata/llvm-cgdata.cpp| 46 +++--- 20 files changed, 572 insertions(+), 84 deletions(-) create mode 100644 llvm/test/tools/llvm-cgdata/merge-combined-funcmap-hashtree.test create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-archive.test create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-concat.test create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-double.test create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-single.test rename llvm/test/tools/llvm-cgdata/{merge-archive.test => merge-hashtree-archive.test} (91%) rename llvm/test/tools/llvm-cgdata/{merge-concat.test => merge-hashtree-concat.test} (93%) rename llvm/test/tools/llvm-cgdata/{merge-double.test => merge-hashtree-double.test} (90%) rename llvm/test/tools/llvm-cgdata/{merge-single.test => merge-hashtree-single.test} (92%) diff --git a/llvm/docs/CommandGuide/llvm-cgdata.rst b/llvm/docs/CommandGuide/llvm-cgdata.rst index f592e1508844ee..0670decd087e39 100644 --- a/llvm/docs/CommandGuide/llvm-cgdata.rst +++ b/llvm/docs/CommandGuide/llvm-cgdata.rst @@ -11,15 +11,13 @@ SYNOPSIS DESCRIPTION --- -The :program:llvm-cgdata utility parses raw codegen data embedded -in compiled binary files and merges them into a single .cgdata file. -It can also inspect and manipulate .cgdata files. -Currently, the tool supports saving and restoring outlined hash trees, -enabling global function outlining across modules, allowing for more -efficient function outlining in subsequent compilations. -The design is extensible, allowing for the incorporation of additional -codegen summaries and optimization techniques, such as global function -merging, in the future. +The :program:llvm-cgdata utility parses raw codegen data embedded in compiled +binary files and merges them into a single .cgdata file. It can also inspect +and manipulate .cgdata files. Currently, the tool supports saving and restoring +outlined hash trees and stable function maps, allowing for more efficient +function outlining and function merging across modules in subsequent +compilations. The design is extensible, allowing for the incorporation of +additional codegen summaries and optimization techniques. COMMANDS diff --git a/llvm/include/llvm/CGData/CodeGenData.h b/llvm/include/llvm/CGData/CodeGenData.h index 53550beeae1f83..5d7c74725ccef1 100644 --- a/llvm/include/llvm/CGData/CodeGenData.h +++ b/llvm/include/llvm/CGData/CodeGenData.h @@ -19,6 +19,7 @@ #include "llvm/Bitcode/BitcodeReader.h" #include "llvm/CGData/OutlinedHashTree.h" #include "llvm/CGData/OutlinedHashTreeRecord.h" +#include "llvm/CGData/StableFunctionMapRecord.h" #include "llvm/IR/Module.h" #include "llvm/Object/ObjectFile.h" #include "llvm/Support/Caching.h" @@ -41,7 +42,9 @@ enum class CGDataKind { Unknown = 0x0, // A function outlining info. FunctionOutlinedHashTree = 0x1, - LLVM_MARK_AS_BITMASK_ENUM(/*LargestValue=*/FunctionOutlinedHashTree) + // A function merging info. + StableFunctionMergingMap = 0x2, + LLVM_MARK_AS_BITMASK_ENUM(/*LargestValue=*/StableFunctionMergingMap) }; const std::error_category &cgdata_category(); @@ -108,6 +111,8
[llvm-branch-commits] [lld] [llvm] [CGData][llvm-cgdata] Support for stable function map (PR #112664)
https://github.com/kyulee-com updated https://github.com/llvm/llvm-project/pull/112664 >From f6fc25953b8f5109abb968c43ebc7d53f2e475db Mon Sep 17 00:00:00 2001 From: Kyungwoo Lee Date: Mon, 9 Sep 2024 19:38:05 -0700 Subject: [PATCH] [CGData][llvm-cgdata] Support for stable function map This introduces a new cgdata format for stable function maps. The raw data is embedded in the __llvm_merge section during compile time. This data can be read and merged using the llvm-cgdata tool, into an indexed cgdata file. Consequently, the tool is now capable of handling either outlined hash trees, stable function maps, or both, as they are orthogonal. --- lld/test/MachO/cgdata-generate.s | 6 +- llvm/docs/CommandGuide/llvm-cgdata.rst| 16 ++-- llvm/include/llvm/CGData/CodeGenData.h| 24 +- llvm/include/llvm/CGData/CodeGenData.inc | 12 ++- llvm/include/llvm/CGData/CodeGenDataReader.h | 29 ++- llvm/include/llvm/CGData/CodeGenDataWriter.h | 17 +++- llvm/lib/CGData/CodeGenData.cpp | 30 --- llvm/lib/CGData/CodeGenDataReader.cpp | 63 +- llvm/lib/CGData/CodeGenDataWriter.cpp | 30 ++- llvm/test/tools/llvm-cgdata/empty.test| 8 +- llvm/test/tools/llvm-cgdata/error.test| 13 +-- .../merge-combined-funcmap-hashtree.test | 66 +++ .../llvm-cgdata/merge-funcmap-archive.test| 83 +++ .../llvm-cgdata/merge-funcmap-concat.test | 78 + .../llvm-cgdata/merge-funcmap-double.test | 79 ++ .../llvm-cgdata/merge-funcmap-single.test | 36 ...chive.test => merge-hashtree-archive.test} | 8 +- ...concat.test => merge-hashtree-concat.test} | 6 +- ...double.test => merge-hashtree-double.test} | 8 +- ...single.test => merge-hashtree-single.test} | 4 +- llvm/tools/llvm-cgdata/llvm-cgdata.cpp| 48 --- 21 files changed, 577 insertions(+), 87 deletions(-) create mode 100644 llvm/test/tools/llvm-cgdata/merge-combined-funcmap-hashtree.test create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-archive.test create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-concat.test create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-double.test create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-single.test rename llvm/test/tools/llvm-cgdata/{merge-archive.test => merge-hashtree-archive.test} (91%) rename llvm/test/tools/llvm-cgdata/{merge-concat.test => merge-hashtree-concat.test} (93%) rename llvm/test/tools/llvm-cgdata/{merge-double.test => merge-hashtree-double.test} (90%) rename llvm/test/tools/llvm-cgdata/{merge-single.test => merge-hashtree-single.test} (92%) diff --git a/lld/test/MachO/cgdata-generate.s b/lld/test/MachO/cgdata-generate.s index 174df39d666c5d..f942ae07f64e0e 100644 --- a/lld/test/MachO/cgdata-generate.s +++ b/lld/test/MachO/cgdata-generate.s @@ -3,12 +3,12 @@ # RUN: rm -rf %t; split-file %s %t -# Synthesize raw cgdata without the header (24 byte) from the indexed cgdata. +# Synthesize raw cgdata without the header (32 byte) from the indexed cgdata. # RUN: llvm-cgdata --convert --format binary %t/raw-1.cgtext -o %t/raw-1.cgdata -# RUN: od -t x1 -j 24 -An %t/raw-1.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ /g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-1-bytes.txt +# RUN: od -t x1 -j 32 -An %t/raw-1.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ /g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-1-bytes.txt # RUN: sed "s//$(cat %t/raw-1-bytes.txt)/g" %t/merge-template.s > %t/merge-1.s # RUN: llvm-cgdata --convert --format binary %t/raw-2.cgtext -o %t/raw-2.cgdata -# RUN: od -t x1 -j 24 -An %t/raw-2.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ /g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-2-bytes.txt +# RUN: od -t x1 -j 32 -An %t/raw-2.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ /g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-2-bytes.txt # RUN: sed "s//$(cat %t/raw-2-bytes.txt)/g" %t/merge-template.s > %t/merge-2.s # RUN: llvm-mc -filetype obj -triple arm64-apple-darwin %t/merge-1.s -o %t/merge-1.o diff --git a/llvm/docs/CommandGuide/llvm-cgdata.rst b/llvm/docs/CommandGuide/llvm-cgdata.rst index f592e1508844ee..0670decd087e39 100644 --- a/llvm/docs/CommandGuide/llvm-cgdata.rst +++ b/llvm/docs/CommandGuide/llvm-cgdata.rst @@ -11,15 +11,13 @@ SYNOPSIS DESCRIPTION --- -The :program:llvm-cgdata utility parses raw codegen data embedded -in compiled binary files and merges them into a single .cgdata file. -It can also inspect and manipulate .cgdata files. -Currently, the tool supports saving and restoring outlined hash trees, -enabling global function outlining across modules, allowing for more -efficient function outlining in subsequent compilations. -The design is extensible, allowing for the incorporation of additional -codegen summaries and optimization techniques, such as global function -merging, in the futur
[llvm-branch-commits] [llvm] [CGData] Global Merge Functions (PR #112671)
https://github.com/kyulee-com created https://github.com/llvm/llvm-project/pull/112671 None >From 2a690c75924de5feadb4a582d76822b4d4d1d2cf Mon Sep 17 00:00:00 2001 From: Kyungwoo Lee Date: Fri, 30 Aug 2024 00:09:09 -0700 Subject: [PATCH] [CGData] Global Merge Functions --- llvm/include/llvm/CGData/CodeGenData.h| 11 + llvm/include/llvm/InitializePasses.h | 1 + llvm/include/llvm/LinkAllPasses.h | 1 + llvm/include/llvm/Passes/CodeGenPassBuilder.h | 1 + llvm/include/llvm/Transforms/IPO.h| 2 + .../Transforms/IPO/GlobalMergeFunctions.h | 73 ++ llvm/lib/CodeGen/TargetPassConfig.cpp | 3 + llvm/lib/LTO/LTO.cpp | 1 + llvm/lib/Transforms/IPO/CMakeLists.txt| 2 + .../Transforms/IPO/GlobalMergeFunctions.cpp | 669 ++ .../ThinLTO/AArch64/cgdata-merge-local.ll | 62 ++ .../test/ThinLTO/AArch64/cgdata-merge-read.ll | 82 +++ .../AArch64/cgdata-merge-two-rounds.ll| 68 ++ .../ThinLTO/AArch64/cgdata-merge-write.ll | 97 +++ llvm/tools/llvm-lto2/CMakeLists.txt | 1 + llvm/tools/llvm-lto2/llvm-lto2.cpp| 6 + 16 files changed, 1080 insertions(+) create mode 100644 llvm/include/llvm/Transforms/IPO/GlobalMergeFunctions.h create mode 100644 llvm/lib/Transforms/IPO/GlobalMergeFunctions.cpp create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-local.ll create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-read.ll create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-two-rounds.ll create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-write.ll diff --git a/llvm/include/llvm/CGData/CodeGenData.h b/llvm/include/llvm/CGData/CodeGenData.h index 5d7c74725ccef1..da0e412f2a0e03 100644 --- a/llvm/include/llvm/CGData/CodeGenData.h +++ b/llvm/include/llvm/CGData/CodeGenData.h @@ -145,6 +145,9 @@ class CodeGenData { const OutlinedHashTree *getOutlinedHashTree() { return PublishedHashTree.get(); } + const StableFunctionMap *getStableFunctionMap() { +return PublishedStableFunctionMap.get(); + } /// Returns true if we should write codegen data. bool emitCGData() { return EmitCGData; } @@ -169,10 +172,18 @@ inline bool hasOutlinedHashTree() { return CodeGenData::getInstance().hasOutlinedHashTree(); } +inline bool hasStableFunctionMap() { + return CodeGenData::getInstance().hasStableFunctionMap(); +} + inline const OutlinedHashTree *getOutlinedHashTree() { return CodeGenData::getInstance().getOutlinedHashTree(); } +inline const StableFunctionMap *getStableFunctionMap() { + return CodeGenData::getInstance().getStableFunctionMap(); +} + inline bool emitCGData() { return CodeGenData::getInstance().emitCGData(); } inline void diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h index 4352099d6dbb99..9aa36d5bb7f801 100644 --- a/llvm/include/llvm/InitializePasses.h +++ b/llvm/include/llvm/InitializePasses.h @@ -123,6 +123,7 @@ void initializeGCEmptyBasicBlocksPass(PassRegistry &); void initializeGCMachineCodeAnalysisPass(PassRegistry &); void initializeGCModuleInfoPass(PassRegistry &); void initializeGVNLegacyPassPass(PassRegistry &); +void initializeGlobalMergeFuncPass(PassRegistry &); void initializeGlobalMergePass(PassRegistry &); void initializeGlobalsAAWrapperPassPass(PassRegistry &); void initializeHardwareLoopsLegacyPass(PassRegistry &); diff --git a/llvm/include/llvm/LinkAllPasses.h b/llvm/include/llvm/LinkAllPasses.h index 92b59a66567c95..ea3609a2b4bc71 100644 --- a/llvm/include/llvm/LinkAllPasses.h +++ b/llvm/include/llvm/LinkAllPasses.h @@ -79,6 +79,7 @@ struct ForcePassLinking { (void)llvm::createDomOnlyViewerWrapperPassPass(); (void)llvm::createDomViewerWrapperPassPass(); (void)llvm::createAlwaysInlinerLegacyPass(); +(void)llvm::createGlobalMergeFuncPass(); (void)llvm::createGlobalsAAWrapperPass(); (void)llvm::createInstSimplifyLegacyPass(); (void)llvm::createInstructionCombiningPass(); diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h b/llvm/include/llvm/Passes/CodeGenPassBuilder.h index 13bc4700d87029..96b5b815132bc0 100644 --- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h +++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h @@ -74,6 +74,7 @@ #include "llvm/Target/CGPassBuilderOption.h" #include "llvm/Target/TargetMachine.h" #include "llvm/Transforms/CFGuard.h" +#include "llvm/Transforms/IPO/GlobalMergeFunctions.h" #include "llvm/Transforms/Scalar/ConstantHoisting.h" #include "llvm/Transforms/Scalar/LoopPassManager.h" #include "llvm/Transforms/Scalar/LoopStrengthReduce.h" diff --git a/llvm/include/llvm/Transforms/IPO.h b/llvm/include/llvm/Transforms/IPO.h index ee0e35aa618325..86a8654f56997c 100644 --- a/llvm/include/llvm/Transforms/IPO.h +++ b/llvm/include/llvm/Transforms/IPO.h @@ -55,6 +55,8 @@ enum class PassSummaryAction { Export, ///< Export information to summary. }; +Pass *createGlobalMerg
[llvm-branch-commits] [lld] [CGData][lld-macho] Add Global Merge Func Pass (PR #112674)
https://github.com/kyulee-com created https://github.com/llvm/llvm-project/pull/112674 None >From 36978c1da750496941705b284b3c34495b6f7386 Mon Sep 17 00:00:00 2001 From: Kyungwoo Lee Date: Wed, 16 Oct 2024 22:56:38 -0700 Subject: [PATCH] [CGData][lld-macho] Add Global Merge Func Pass --- lld/MachO/CMakeLists.txt | 2 + lld/MachO/Driver.cpp | 18 +- lld/MachO/InputSection.h | 1 + lld/MachO/LTO.cpp | 7 +++ lld/test/MachO/cgdata-generate-merge.s | 85 ++ 5 files changed, 112 insertions(+), 1 deletion(-) create mode 100644 lld/test/MachO/cgdata-generate-merge.s diff --git a/lld/MachO/CMakeLists.txt b/lld/MachO/CMakeLists.txt index ecf6ce609e59f2..137fe4939b4457 100644 --- a/lld/MachO/CMakeLists.txt +++ b/lld/MachO/CMakeLists.txt @@ -41,9 +41,11 @@ add_lld_library(lldMachO BitReader BitWriter CGData + CodeGen Core DebugInfoDWARF Demangle + IPO LTO MC ObjCARCOpts diff --git a/lld/MachO/Driver.cpp b/lld/MachO/Driver.cpp index ab4abb1fa97efc..59c24a06a2cb20 100644 --- a/lld/MachO/Driver.cpp +++ b/lld/MachO/Driver.cpp @@ -1326,7 +1326,8 @@ static void codegenDataGenerate() { TimeTraceScope timeScope("Generating codegen data"); OutlinedHashTreeRecord globalOutlineRecord; - for (ConcatInputSection *isec : inputSections) + StableFunctionMapRecord globalMergeRecord; + for (ConcatInputSection *isec : inputSections) { if (isec->getSegName() == segment_names::data && isec->getName() == section_names::outlinedHashTree) { // Read outlined hash tree from each section. @@ -1337,10 +1338,25 @@ static void codegenDataGenerate() { // Merge it to the global hash tree. globalOutlineRecord.merge(localOutlineRecord); } +if (isec->getSegName() == segment_names::data && +isec->getName() == section_names::functionmap) { + // Read stable functions from each section. + StableFunctionMapRecord localMergeRecord; + auto *data = isec->data.data(); + localMergeRecord.deserialize(data); + + // Merge it to the global function map. + globalMergeRecord.merge(localMergeRecord); +} + } + + globalMergeRecord.finalize(); CodeGenDataWriter Writer; if (!globalOutlineRecord.empty()) Writer.addRecord(globalOutlineRecord); + if (!globalMergeRecord.empty()) +Writer.addRecord(globalMergeRecord); std::error_code EC; auto fileName = config->codegenDataGeneratePath; diff --git a/lld/MachO/InputSection.h b/lld/MachO/InputSection.h index 7ef0e31066f372..b86520d36cda5b 100644 --- a/lld/MachO/InputSection.h +++ b/lld/MachO/InputSection.h @@ -339,6 +339,7 @@ constexpr const char const_[] = "__const"; constexpr const char lazySymbolPtr[] = "__la_symbol_ptr"; constexpr const char lazyBinding[] = "__lazy_binding"; constexpr const char literals[] = "__literals"; +constexpr const char functionmap[] = "__llvm_merge"; constexpr const char moduleInitFunc[] = "__mod_init_func"; constexpr const char moduleTermFunc[] = "__mod_term_func"; constexpr const char nonLazySymbolPtr[] = "__nl_symbol_ptr"; diff --git a/lld/MachO/LTO.cpp b/lld/MachO/LTO.cpp index 28f5290edb58e3..9bddf9a6445f6d 100644 --- a/lld/MachO/LTO.cpp +++ b/lld/MachO/LTO.cpp @@ -25,6 +25,7 @@ #include "llvm/Support/FileSystem.h" #include "llvm/Support/Path.h" #include "llvm/Support/raw_ostream.h" +#include "llvm/Transforms/IPO.h" #include "llvm/Transforms/ObjCARC.h" using namespace lld; @@ -38,6 +39,8 @@ static std::string getThinLTOOutputFile(StringRef modulePath) { config->thinLTOPrefixReplaceNew); } +extern cl::opt EnableGlobalMergeFunc; + static lto::Config createConfig() { lto::Config c; c.Options = initTargetOptionsFromCodeGenFlags(); @@ -49,6 +52,10 @@ static lto::Config createConfig() { c.MAttrs = getMAttrs(); c.DiagHandler = diagnosticHandler; + c.PreCodeGenPassesHook = [](legacy::PassManager &pm) { +if (EnableGlobalMergeFunc) + pm.add(createGlobalMergeFuncPass()); + }; c.AlwaysEmitRegularLTOObj = !config->ltoObjPath.empty(); c.TimeTraceEnabled = config->timeTraceEnabled; diff --git a/lld/test/MachO/cgdata-generate-merge.s b/lld/test/MachO/cgdata-generate-merge.s new file mode 100644 index 00..3f7fb6777bc3cf --- /dev/null +++ b/lld/test/MachO/cgdata-generate-merge.s @@ -0,0 +1,85 @@ +# UNSUPPORTED: system-windows +# REQUIRES: aarch64 + +# RUN: rm -rf %t; split-file %s %t + +# Synthesize raw cgdata without the header (32 byte) from the indexed cgdata. +# RUN: llvm-cgdata --convert --format binary %t/raw-1.cgtext -o %t/raw-1.cgdata +# RUN: od -t x1 -j 32 -An %t/raw-1.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ /g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-1-bytes.txt +# RUN: sed "s//$(cat %t/raw-1-bytes.txt)/g" %t/merge-template.s > %t/merge-1.s +# RUN: llvm-cgdata --convert --format binary %t/raw-2.cgtext -o %t/raw-2.cgdata +# RUN: od -t
[llvm-branch-commits] [llvm] [CGData] Stable Function Map (PR #112662)
https://github.com/kyulee-com created https://github.com/llvm/llvm-project/pull/112662 These define the main data structures to represent stable functions and group similar functions in a function map. Serialization is supported in a binary or yaml form. >From e7272c3a0293a0b2972e893335d652cc1ea27ebc Mon Sep 17 00:00:00 2001 From: Kyungwoo Lee Date: Sat, 7 Sep 2024 22:48:17 -0700 Subject: [PATCH] [CGData] Stable Function Map These define the main data structures to represent stable functions and group similar functions in a function map. Serialization is supported in a binary or yaml form. --- llvm/include/llvm/CGData/StableFunctionMap.h | 139 .../llvm/CGData/StableFunctionMapRecord.h | 64 ++ llvm/lib/CGData/CMakeLists.txt| 2 + llvm/lib/CGData/StableFunctionMap.cpp | 167 +++ llvm/lib/CGData/StableFunctionMapRecord.cpp | 197 ++ llvm/unittests/CGData/CMakeLists.txt | 2 + .../CGData/StableFunctionMapRecordTest.cpp| 131 .../CGData/StableFunctionMapTest.cpp | 146 + 8 files changed, 848 insertions(+) create mode 100644 llvm/include/llvm/CGData/StableFunctionMap.h create mode 100644 llvm/include/llvm/CGData/StableFunctionMapRecord.h create mode 100644 llvm/lib/CGData/StableFunctionMap.cpp create mode 100644 llvm/lib/CGData/StableFunctionMapRecord.cpp create mode 100644 llvm/unittests/CGData/StableFunctionMapRecordTest.cpp create mode 100644 llvm/unittests/CGData/StableFunctionMapTest.cpp diff --git a/llvm/include/llvm/CGData/StableFunctionMap.h b/llvm/include/llvm/CGData/StableFunctionMap.h new file mode 100644 index 00..1dbc4257af1340 --- /dev/null +++ b/llvm/include/llvm/CGData/StableFunctionMap.h @@ -0,0 +1,139 @@ +//===- StableFunctionMap.h -*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===-===// +// +// TODO +// +//===-===// + +#ifndef LLVM_CGDATA_STABLEFUNCTIONMAP_H +#define LLVM_CGDATA_STABLEFUNCTIONMAP_H + +#include "llvm/ADT/DenseMap.h" +#include "llvm/ADT/StableHashing.h" +#include "llvm/ADT/StringMap.h" +#include "llvm/IR/StructuralHash.h" +#include "llvm/ObjectYAML/YAML.h" +#include "llvm/Support/raw_ostream.h" + +#include +#include + +namespace llvm { + +using IndexPairHash = std::pair; +using IndexOperandHashVecType = SmallVector; + +/// A stable function is a function with a stable hash while tracking the +/// locations of ignored operands and their hashes. +struct StableFunction { + /// The combined stable hash of the function. + stable_hash Hash; + /// The name of the function. + std::string FunctionName; + /// The name of the module the function is in. + std::string ModuleName; + /// The number of instructions. + unsigned InstCount; + /// A vector of pairs of IndexPair and operand hash which was skipped. + IndexOperandHashVecType IndexOperandHashes; + + StableFunction(stable_hash Hash, const std::string FunctionName, + const std::string ModuleName, unsigned InstCount, + IndexOperandHashVecType &&IndexOperandHashes) + : Hash(Hash), FunctionName(FunctionName), ModuleName(ModuleName), +InstCount(InstCount), +IndexOperandHashes(std::move(IndexOperandHashes)) {} + StableFunction() = default; +}; + +/// An efficient form of StableFunction for fast look-up +struct StableFunctionEntry { + /// The combined stable hash of the function. + stable_hash Hash; + /// Id of the function name. + unsigned FunctionNameId; + /// Id of the module name. + unsigned ModuleNameId; + /// The number of instructions. + unsigned InstCount; + /// A map from an IndexPair to a stable_hash which was skipped. + std::unique_ptr IndexOperandHashMap; + + StableFunctionEntry( + stable_hash Hash, unsigned FunctionNameId, unsigned ModuleNameId, + unsigned InstCount, + std::unique_ptr IndexOperandHashMap) + : Hash(Hash), FunctionNameId(FunctionNameId), ModuleNameId(ModuleNameId), +InstCount(InstCount), +IndexOperandHashMap(std::move(IndexOperandHashMap)) {} +}; + +using HashFuncsMapType = +DenseMap>>; + +class StableFunctionMap { + /// A map from a stable_hash to a vector of functions with that hash. + HashFuncsMapType HashToFuncs; + /// A vector of strings to hold names. + SmallVector IdToName; + /// A map from StringRef (name) to an ID. + StringMap NameToId; + /// True if the function map is finalized with minimal content. + bool Finalized = false; + +public: + /// Get the HashToFuncs map for serialization. + const HashFuncsMapType &getFunctionMap() const { return HashToFuncs; } + + /// Get the N
[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)
https://github.com/kyulee-com created https://github.com/llvm/llvm-project/pull/112638 This comutes a structural hash while allowing for selective ignoring of certain operands based on a custom function that is provided. Instead of a single hash value, it now returns FunctionHashInfo which includes a hash value, an instruction mapping, and a map to track the operand location and its corresponding hash value that is ignored. >From 6225d74229d41068c57109a24b063f6fcba13985 Mon Sep 17 00:00:00 2001 From: Kyungwoo Lee Date: Wed, 16 Oct 2024 17:09:07 -0700 Subject: [PATCH] [StructuralHash] Support Differences This comutes a structural hash while allowing for selective ignoring of certain operands based on a custom function that is provided. Instead of a single hash value, it now returns FunctionHashInfo which includes a hash value, an instruction mapping, and a map to track the operand location and its corresponding hash value that is ignored. --- llvm/include/llvm/IR/StructuralHash.h| 46 ++ llvm/lib/IR/StructuralHash.cpp | 188 +-- llvm/unittests/IR/StructuralHashTest.cpp | 55 +++ 3 files changed, 275 insertions(+), 14 deletions(-) diff --git a/llvm/include/llvm/IR/StructuralHash.h b/llvm/include/llvm/IR/StructuralHash.h index aa292bc3446799..bc82c204c4d1f6 100644 --- a/llvm/include/llvm/IR/StructuralHash.h +++ b/llvm/include/llvm/IR/StructuralHash.h @@ -14,7 +14,9 @@ #ifndef LLVM_IR_STRUCTURALHASH_H #define LLVM_IR_STRUCTURALHASH_H +#include "llvm/ADT/MapVector.h" #include "llvm/ADT/StableHashing.h" +#include "llvm/IR/Instruction.h" #include namespace llvm { @@ -23,6 +25,7 @@ class Function; class Module; using IRHash = stable_hash; +using OpndHash = stable_hash; /// Returns a hash of the function \p F. /// \param F The function to hash. @@ -37,6 +40,49 @@ IRHash StructuralHash(const Function &F, bool DetailedHash = false); /// composed the module hash. IRHash StructuralHash(const Module &M, bool DetailedHash = false); +/// The pair of an instruction index and a operand index. +using IndexPair = std::pair; + +/// A map from an instruction index to an instruction pointer. +using IndexInstrMap = MapVector; + +/// A map from an IndexPair to an OpndHash. +using IndexOperandHashMapType = DenseMap; + +/// A function that takes an instruction and an operand index and returns true +/// if the operand should be ignored in the function hash computation. +using IgnoreOperandFunc = std::function; + +struct FunctionHashInfo { + /// A hash value representing the structural content of the function + IRHash FunctionHash; + /// A mapping from instruction indices to instruction pointers + std::unique_ptr IndexInstruction; + /// A mapping from pairs of instruction indices and operand indices + /// to the hashes of the operands. This can be used to analyze or + /// reconstruct the differences in ignored operands + std::unique_ptr IndexOperandHashMap; + + FunctionHashInfo(IRHash FuntionHash, + std::unique_ptr IndexInstruction, + std::unique_ptr IndexOperandHashMap) + : FunctionHash(FuntionHash), +IndexInstruction(std::move(IndexInstruction)), +IndexOperandHashMap(std::move(IndexOperandHashMap)) {} +}; + +/// Computes a structural hash of a given function, considering the structure +/// and content of the function's instructions while allowing for selective +/// ignoring of certain operands based on custom criteria. This hash can be used +/// to identify functions that are structurally similar or identical, which is +/// useful in optimizations, deduplication, or analysis tasks. +/// \param F The function to hash. +/// \param IgnoreOp A callable that takes an instruction and an operand index, +/// and returns true if the operand should be ignored in the hash computation. +/// \return A FunctionHashInfo structure +FunctionHashInfo StructuralHashWithDifferences(const Function &F, + IgnoreOperandFunc IgnoreOp); + } // end namespace llvm #endif diff --git a/llvm/lib/IR/StructuralHash.cpp b/llvm/lib/IR/StructuralHash.cpp index a1fabab77d52b2..6e0af666010a05 100644 --- a/llvm/lib/IR/StructuralHash.cpp +++ b/llvm/lib/IR/StructuralHash.cpp @@ -28,6 +28,19 @@ class StructuralHashImpl { bool DetailedHash; + /// IgnoreOp is a function that returns true if the operand should be ignored. + IgnoreOperandFunc IgnoreOp = nullptr; + /// A mapping from instruction indices to instruction pointers. + /// The index represents the position of an instruction based on the order in + /// which it is first encountered. + std::unique_ptr IndexInstruction = nullptr; + /// A mapping from pairs of instruction indices and operand indices + /// to the hashes of the operands. + std::unique_ptr IndexOperandHashMap = nullptr; + + /// Assign a unique ID to each Value in the order they are first seen. + DenseMap ValueToId; + // This will produce diff
[llvm-branch-commits] [llvm] [CGData][llvm-cgdata] Support for stable function map (PR #112664)
https://github.com/kyulee-com created https://github.com/llvm/llvm-project/pull/112664 This introduces a new cgdata format for stable function maps. The raw data is embedded in the __llvm_merge section during compile time. This data can be read and merged using the llvm-cgdata tool, into an indexed cgdata file. Consequently, the tool is now capable of handling either outlined hash trees, stable function maps, or both, as they are orthogonal. >From af5931f2a7aa020afed0ad474b6e6a7e4c564703 Mon Sep 17 00:00:00 2001 From: Kyungwoo Lee Date: Mon, 9 Sep 2024 19:38:05 -0700 Subject: [PATCH] [CGData][llvm-cgdata] Support for stable function map This introduces a new cgdata format for stable function maps. The raw data is embedded in the __llvm_merge section during compile time. This data can be read and merged using the llvm-cgdata tool, into an indexed cgdata file. Consequently, the tool is now capable of handling either outlined hash trees, stable function maps, or both, as they are orthogonal. --- llvm/docs/CommandGuide/llvm-cgdata.rst| 16 ++-- llvm/include/llvm/CGData/CodeGenData.h| 24 +- llvm/include/llvm/CGData/CodeGenData.inc | 12 ++- llvm/include/llvm/CGData/CodeGenDataReader.h | 29 ++- llvm/include/llvm/CGData/CodeGenDataWriter.h | 17 +++- llvm/lib/CGData/CodeGenData.cpp | 30 --- llvm/lib/CGData/CodeGenDataReader.cpp | 63 +- llvm/lib/CGData/CodeGenDataWriter.cpp | 30 ++- llvm/test/tools/llvm-cgdata/empty.test| 8 +- llvm/test/tools/llvm-cgdata/error.test| 13 +-- .../merge-combined-funcmap-hashtree.test | 66 +++ .../llvm-cgdata/merge-funcmap-archive.test| 83 +++ .../llvm-cgdata/merge-funcmap-concat.test | 78 + .../llvm-cgdata/merge-funcmap-double.test | 79 ++ .../llvm-cgdata/merge-funcmap-single.test | 36 ...chive.test => merge-hashtree-archive.test} | 8 +- ...concat.test => merge-hashtree-concat.test} | 6 +- ...double.test => merge-hashtree-double.test} | 8 +- ...single.test => merge-hashtree-single.test} | 4 +- llvm/tools/llvm-cgdata/llvm-cgdata.cpp| 46 +++--- 20 files changed, 572 insertions(+), 84 deletions(-) create mode 100644 llvm/test/tools/llvm-cgdata/merge-combined-funcmap-hashtree.test create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-archive.test create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-concat.test create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-double.test create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-single.test rename llvm/test/tools/llvm-cgdata/{merge-archive.test => merge-hashtree-archive.test} (91%) rename llvm/test/tools/llvm-cgdata/{merge-concat.test => merge-hashtree-concat.test} (93%) rename llvm/test/tools/llvm-cgdata/{merge-double.test => merge-hashtree-double.test} (90%) rename llvm/test/tools/llvm-cgdata/{merge-single.test => merge-hashtree-single.test} (92%) diff --git a/llvm/docs/CommandGuide/llvm-cgdata.rst b/llvm/docs/CommandGuide/llvm-cgdata.rst index f592e1508844ee..0670decd087e39 100644 --- a/llvm/docs/CommandGuide/llvm-cgdata.rst +++ b/llvm/docs/CommandGuide/llvm-cgdata.rst @@ -11,15 +11,13 @@ SYNOPSIS DESCRIPTION --- -The :program:llvm-cgdata utility parses raw codegen data embedded -in compiled binary files and merges them into a single .cgdata file. -It can also inspect and manipulate .cgdata files. -Currently, the tool supports saving and restoring outlined hash trees, -enabling global function outlining across modules, allowing for more -efficient function outlining in subsequent compilations. -The design is extensible, allowing for the incorporation of additional -codegen summaries and optimization techniques, such as global function -merging, in the future. +The :program:llvm-cgdata utility parses raw codegen data embedded in compiled +binary files and merges them into a single .cgdata file. It can also inspect +and manipulate .cgdata files. Currently, the tool supports saving and restoring +outlined hash trees and stable function maps, allowing for more efficient +function outlining and function merging across modules in subsequent +compilations. The design is extensible, allowing for the incorporation of +additional codegen summaries and optimization techniques. COMMANDS diff --git a/llvm/include/llvm/CGData/CodeGenData.h b/llvm/include/llvm/CGData/CodeGenData.h index 53550beeae1f83..5d7c74725ccef1 100644 --- a/llvm/include/llvm/CGData/CodeGenData.h +++ b/llvm/include/llvm/CGData/CodeGenData.h @@ -19,6 +19,7 @@ #include "llvm/Bitcode/BitcodeReader.h" #include "llvm/CGData/OutlinedHashTree.h" #include "llvm/CGData/OutlinedHashTreeRecord.h" +#include "llvm/CGData/StableFunctionMapRecord.h" #include "llvm/IR/Module.h" #include "llvm/Object/ObjectFile.h" #include "llvm/Support/Caching.h" @@ -41,7 +42,9 @@ enum class CGDataKind {
[llvm-branch-commits] [llvm] [CGData] Global Merge Functions (PR #112671)
https://github.com/kyulee-com updated https://github.com/llvm/llvm-project/pull/112671 >From 1601086634428b95d1a195e5ecb8f5b9d1f1709c Mon Sep 17 00:00:00 2001 From: Kyungwoo Lee Date: Fri, 30 Aug 2024 00:09:09 -0700 Subject: [PATCH] [CGData] Global Merge Functions --- llvm/include/llvm/CGData/CodeGenData.h| 11 + llvm/include/llvm/InitializePasses.h | 1 + llvm/include/llvm/LinkAllPasses.h | 1 + llvm/include/llvm/Passes/CodeGenPassBuilder.h | 1 + llvm/include/llvm/Transforms/IPO.h| 2 + .../Transforms/IPO/GlobalMergeFunctions.h | 77 ++ llvm/lib/CodeGen/TargetPassConfig.cpp | 3 + llvm/lib/LTO/LTO.cpp | 1 + llvm/lib/Transforms/IPO/CMakeLists.txt| 2 + .../Transforms/IPO/GlobalMergeFunctions.cpp | 687 ++ .../ThinLTO/AArch64/cgdata-merge-local.ll | 62 ++ .../test/ThinLTO/AArch64/cgdata-merge-read.ll | 82 +++ .../AArch64/cgdata-merge-two-rounds.ll| 68 ++ .../ThinLTO/AArch64/cgdata-merge-write.ll | 97 +++ llvm/tools/llvm-lto2/CMakeLists.txt | 1 + llvm/tools/llvm-lto2/llvm-lto2.cpp| 6 + 16 files changed, 1102 insertions(+) create mode 100644 llvm/include/llvm/Transforms/IPO/GlobalMergeFunctions.h create mode 100644 llvm/lib/Transforms/IPO/GlobalMergeFunctions.cpp create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-local.ll create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-read.ll create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-two-rounds.ll create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-write.ll diff --git a/llvm/include/llvm/CGData/CodeGenData.h b/llvm/include/llvm/CGData/CodeGenData.h index 5d7c74725ccef1..da0e412f2a0e03 100644 --- a/llvm/include/llvm/CGData/CodeGenData.h +++ b/llvm/include/llvm/CGData/CodeGenData.h @@ -145,6 +145,9 @@ class CodeGenData { const OutlinedHashTree *getOutlinedHashTree() { return PublishedHashTree.get(); } + const StableFunctionMap *getStableFunctionMap() { +return PublishedStableFunctionMap.get(); + } /// Returns true if we should write codegen data. bool emitCGData() { return EmitCGData; } @@ -169,10 +172,18 @@ inline bool hasOutlinedHashTree() { return CodeGenData::getInstance().hasOutlinedHashTree(); } +inline bool hasStableFunctionMap() { + return CodeGenData::getInstance().hasStableFunctionMap(); +} + inline const OutlinedHashTree *getOutlinedHashTree() { return CodeGenData::getInstance().getOutlinedHashTree(); } +inline const StableFunctionMap *getStableFunctionMap() { + return CodeGenData::getInstance().getStableFunctionMap(); +} + inline bool emitCGData() { return CodeGenData::getInstance().emitCGData(); } inline void diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h index 4352099d6dbb99..9aa36d5bb7f801 100644 --- a/llvm/include/llvm/InitializePasses.h +++ b/llvm/include/llvm/InitializePasses.h @@ -123,6 +123,7 @@ void initializeGCEmptyBasicBlocksPass(PassRegistry &); void initializeGCMachineCodeAnalysisPass(PassRegistry &); void initializeGCModuleInfoPass(PassRegistry &); void initializeGVNLegacyPassPass(PassRegistry &); +void initializeGlobalMergeFuncPass(PassRegistry &); void initializeGlobalMergePass(PassRegistry &); void initializeGlobalsAAWrapperPassPass(PassRegistry &); void initializeHardwareLoopsLegacyPass(PassRegistry &); diff --git a/llvm/include/llvm/LinkAllPasses.h b/llvm/include/llvm/LinkAllPasses.h index 92b59a66567c95..ea3609a2b4bc71 100644 --- a/llvm/include/llvm/LinkAllPasses.h +++ b/llvm/include/llvm/LinkAllPasses.h @@ -79,6 +79,7 @@ struct ForcePassLinking { (void)llvm::createDomOnlyViewerWrapperPassPass(); (void)llvm::createDomViewerWrapperPassPass(); (void)llvm::createAlwaysInlinerLegacyPass(); +(void)llvm::createGlobalMergeFuncPass(); (void)llvm::createGlobalsAAWrapperPass(); (void)llvm::createInstSimplifyLegacyPass(); (void)llvm::createInstructionCombiningPass(); diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h b/llvm/include/llvm/Passes/CodeGenPassBuilder.h index 13bc4700d87029..96b5b815132bc0 100644 --- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h +++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h @@ -74,6 +74,7 @@ #include "llvm/Target/CGPassBuilderOption.h" #include "llvm/Target/TargetMachine.h" #include "llvm/Transforms/CFGuard.h" +#include "llvm/Transforms/IPO/GlobalMergeFunctions.h" #include "llvm/Transforms/Scalar/ConstantHoisting.h" #include "llvm/Transforms/Scalar/LoopPassManager.h" #include "llvm/Transforms/Scalar/LoopStrengthReduce.h" diff --git a/llvm/include/llvm/Transforms/IPO.h b/llvm/include/llvm/Transforms/IPO.h index ee0e35aa618325..86a8654f56997c 100644 --- a/llvm/include/llvm/Transforms/IPO.h +++ b/llvm/include/llvm/Transforms/IPO.h @@ -55,6 +55,8 @@ enum class PassSummaryAction { Export, ///< Export information to summary. }; +Pass *createGlobalMergeFuncP
[llvm-branch-commits] [lld] [llvm] [CGData][llvm-cgdata] Support for stable function map (PR #112664)
https://github.com/kyulee-com updated https://github.com/llvm/llvm-project/pull/112664 >From 09f1ec7730868a53cb566b0913e7952dfc15fa16 Mon Sep 17 00:00:00 2001 From: Kyungwoo Lee Date: Mon, 9 Sep 2024 19:38:05 -0700 Subject: [PATCH] [CGData][llvm-cgdata] Support for stable function map This introduces a new cgdata format for stable function maps. The raw data is embedded in the __llvm_merge section during compile time. This data can be read and merged using the llvm-cgdata tool, into an indexed cgdata file. Consequently, the tool is now capable of handling either outlined hash trees, stable function maps, or both, as they are orthogonal. --- lld/test/MachO/cgdata-generate.s | 6 +- llvm/docs/CommandGuide/llvm-cgdata.rst| 16 ++-- llvm/include/llvm/CGData/CodeGenData.h| 24 +- llvm/include/llvm/CGData/CodeGenData.inc | 12 ++- llvm/include/llvm/CGData/CodeGenDataReader.h | 29 ++- llvm/include/llvm/CGData/CodeGenDataWriter.h | 17 +++- llvm/lib/CGData/CodeGenData.cpp | 30 --- llvm/lib/CGData/CodeGenDataReader.cpp | 63 +- llvm/lib/CGData/CodeGenDataWriter.cpp | 30 ++- llvm/test/tools/llvm-cgdata/empty.test| 8 +- llvm/test/tools/llvm-cgdata/error.test| 13 +-- .../merge-combined-funcmap-hashtree.test | 66 +++ .../llvm-cgdata/merge-funcmap-archive.test| 83 +++ .../llvm-cgdata/merge-funcmap-concat.test | 78 + .../llvm-cgdata/merge-funcmap-double.test | 79 ++ .../llvm-cgdata/merge-funcmap-single.test | 36 ...chive.test => merge-hashtree-archive.test} | 8 +- ...concat.test => merge-hashtree-concat.test} | 6 +- ...double.test => merge-hashtree-double.test} | 8 +- ...single.test => merge-hashtree-single.test} | 4 +- llvm/tools/llvm-cgdata/llvm-cgdata.cpp| 48 --- 21 files changed, 577 insertions(+), 87 deletions(-) create mode 100644 llvm/test/tools/llvm-cgdata/merge-combined-funcmap-hashtree.test create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-archive.test create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-concat.test create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-double.test create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-single.test rename llvm/test/tools/llvm-cgdata/{merge-archive.test => merge-hashtree-archive.test} (91%) rename llvm/test/tools/llvm-cgdata/{merge-concat.test => merge-hashtree-concat.test} (93%) rename llvm/test/tools/llvm-cgdata/{merge-double.test => merge-hashtree-double.test} (90%) rename llvm/test/tools/llvm-cgdata/{merge-single.test => merge-hashtree-single.test} (92%) diff --git a/lld/test/MachO/cgdata-generate.s b/lld/test/MachO/cgdata-generate.s index 174df39d666c5d..f942ae07f64e0e 100644 --- a/lld/test/MachO/cgdata-generate.s +++ b/lld/test/MachO/cgdata-generate.s @@ -3,12 +3,12 @@ # RUN: rm -rf %t; split-file %s %t -# Synthesize raw cgdata without the header (24 byte) from the indexed cgdata. +# Synthesize raw cgdata without the header (32 byte) from the indexed cgdata. # RUN: llvm-cgdata --convert --format binary %t/raw-1.cgtext -o %t/raw-1.cgdata -# RUN: od -t x1 -j 24 -An %t/raw-1.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ /g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-1-bytes.txt +# RUN: od -t x1 -j 32 -An %t/raw-1.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ /g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-1-bytes.txt # RUN: sed "s//$(cat %t/raw-1-bytes.txt)/g" %t/merge-template.s > %t/merge-1.s # RUN: llvm-cgdata --convert --format binary %t/raw-2.cgtext -o %t/raw-2.cgdata -# RUN: od -t x1 -j 24 -An %t/raw-2.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ /g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-2-bytes.txt +# RUN: od -t x1 -j 32 -An %t/raw-2.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ /g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-2-bytes.txt # RUN: sed "s//$(cat %t/raw-2-bytes.txt)/g" %t/merge-template.s > %t/merge-2.s # RUN: llvm-mc -filetype obj -triple arm64-apple-darwin %t/merge-1.s -o %t/merge-1.o diff --git a/llvm/docs/CommandGuide/llvm-cgdata.rst b/llvm/docs/CommandGuide/llvm-cgdata.rst index f592e1508844ee..0670decd087e39 100644 --- a/llvm/docs/CommandGuide/llvm-cgdata.rst +++ b/llvm/docs/CommandGuide/llvm-cgdata.rst @@ -11,15 +11,13 @@ SYNOPSIS DESCRIPTION --- -The :program:llvm-cgdata utility parses raw codegen data embedded -in compiled binary files and merges them into a single .cgdata file. -It can also inspect and manipulate .cgdata files. -Currently, the tool supports saving and restoring outlined hash trees, -enabling global function outlining across modules, allowing for more -efficient function outlining in subsequent compilations. -The design is extensible, allowing for the incorporation of additional -codegen summaries and optimization techniques, such as global function -merging, in the futur
[llvm-branch-commits] [llvm] [CGData] Stable Function Map (PR #112662)
https://github.com/kyulee-com updated https://github.com/llvm/llvm-project/pull/112662 >From 060a23e39a68729859bb7b74e38586b0356e2ba6 Mon Sep 17 00:00:00 2001 From: Kyungwoo Lee Date: Sat, 7 Sep 2024 22:48:17 -0700 Subject: [PATCH] [CGData] Stable Function Map These define the main data structures to represent stable functions and group similar functions in a function map. Serialization is supported in a binary or yaml form. --- llvm/include/llvm/CGData/StableFunctionMap.h | 139 .../llvm/CGData/StableFunctionMapRecord.h | 68 ++ llvm/lib/CGData/CMakeLists.txt| 2 + llvm/lib/CGData/StableFunctionMap.cpp | 167 +++ llvm/lib/CGData/StableFunctionMapRecord.cpp | 202 ++ llvm/unittests/CGData/CMakeLists.txt | 2 + .../CGData/StableFunctionMapRecordTest.cpp| 131 .../CGData/StableFunctionMapTest.cpp | 146 + 8 files changed, 857 insertions(+) create mode 100644 llvm/include/llvm/CGData/StableFunctionMap.h create mode 100644 llvm/include/llvm/CGData/StableFunctionMapRecord.h create mode 100644 llvm/lib/CGData/StableFunctionMap.cpp create mode 100644 llvm/lib/CGData/StableFunctionMapRecord.cpp create mode 100644 llvm/unittests/CGData/StableFunctionMapRecordTest.cpp create mode 100644 llvm/unittests/CGData/StableFunctionMapTest.cpp diff --git a/llvm/include/llvm/CGData/StableFunctionMap.h b/llvm/include/llvm/CGData/StableFunctionMap.h new file mode 100644 index 00..ec205ef846f5c9 --- /dev/null +++ b/llvm/include/llvm/CGData/StableFunctionMap.h @@ -0,0 +1,139 @@ +//===- StableFunctionMap.h -*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===-===// +// +// TODO +// +//===-===// + +#ifndef LLVM_CGDATA_STABLEFUNCTIONMAP_H +#define LLVM_CGDATA_STABLEFUNCTIONMAP_H + +#include "llvm/ADT/DenseMap.h" +#include "llvm/ADT/StableHashing.h" +#include "llvm/ADT/StringMap.h" +#include "llvm/IR/StructuralHash.h" +#include "llvm/ObjectYAML/YAML.h" +#include "llvm/Support/raw_ostream.h" + +#include +#include + +namespace llvm { + +using IndexPairHash = std::pair; +using IndexOperandHashVecType = SmallVector; + +/// A stable function is a function with a stable hash while tracking the +/// locations of ignored operands and their hashes. +struct StableFunction { + /// The combined stable hash of the function. + stable_hash Hash; + /// The name of the function. + std::string FunctionName; + /// The name of the module the function is in. + std::string ModuleName; + /// The number of instructions. + unsigned InstCount; + /// A vector of pairs of IndexPair and operand hash which was skipped. + IndexOperandHashVecType IndexOperandHashes; + + StableFunction(stable_hash Hash, const std::string FunctionName, + const std::string ModuleName, unsigned InstCount, + IndexOperandHashVecType &&IndexOperandHashes) + : Hash(Hash), FunctionName(FunctionName), ModuleName(ModuleName), +InstCount(InstCount), +IndexOperandHashes(std::move(IndexOperandHashes)) {} + StableFunction() = default; +}; + +/// An efficient form of StableFunction for fast look-up +struct StableFunctionEntry { + /// The combined stable hash of the function. + stable_hash Hash; + /// Id of the function name. + unsigned FunctionNameId; + /// Id of the module name. + unsigned ModuleNameId; + /// The number of instructions. + unsigned InstCount; + /// A map from an IndexPair to a stable_hash which was skipped. + std::unique_ptr IndexOperandHashMap; + + StableFunctionEntry( + stable_hash Hash, unsigned FunctionNameId, unsigned ModuleNameId, + unsigned InstCount, + std::unique_ptr IndexOperandHashMap) + : Hash(Hash), FunctionNameId(FunctionNameId), ModuleNameId(ModuleNameId), +InstCount(InstCount), +IndexOperandHashMap(std::move(IndexOperandHashMap)) {} +}; + +using HashFuncsMapType = +DenseMap>>; + +class StableFunctionMap { + /// A map from a stable_hash to a vector of functions with that hash. + HashFuncsMapType HashToFuncs; + /// A vector of strings to hold names. + SmallVector IdToName; + /// A map from StringRef (name) to an ID. + StringMap NameToId; + /// True if the function map is finalized with minimal content. + bool Finalized = false; + +public: + /// Get the HashToFuncs map for serialization. + const HashFuncsMapType &getFunctionMap() const { return HashToFuncs; } + + /// Get the NameToId vector for serialization. + const SmallVector getNames() const { return IdToName; } + + /// Get an existing ID associated with the given name or create a new ID
[llvm-branch-commits] [lld] [CGData][lld-macho] Add Global Merge Func Pass (PR #112674)
https://github.com/kyulee-com updated https://github.com/llvm/llvm-project/pull/112674 >From 6b0b6194a02209036e032a8941f8e5817b402318 Mon Sep 17 00:00:00 2001 From: Kyungwoo Lee Date: Wed, 16 Oct 2024 22:56:38 -0700 Subject: [PATCH] [CGData][lld-macho] Add Global Merge Func Pass --- lld/MachO/CMakeLists.txt | 2 + lld/MachO/Driver.cpp | 18 +- lld/MachO/InputSection.h | 1 + lld/MachO/LTO.cpp | 7 +++ lld/test/MachO/cgdata-generate-merge.s | 85 ++ 5 files changed, 112 insertions(+), 1 deletion(-) create mode 100644 lld/test/MachO/cgdata-generate-merge.s diff --git a/lld/MachO/CMakeLists.txt b/lld/MachO/CMakeLists.txt index ecf6ce609e59f2..137fe4939b4457 100644 --- a/lld/MachO/CMakeLists.txt +++ b/lld/MachO/CMakeLists.txt @@ -41,9 +41,11 @@ add_lld_library(lldMachO BitReader BitWriter CGData + CodeGen Core DebugInfoDWARF Demangle + IPO LTO MC ObjCARCOpts diff --git a/lld/MachO/Driver.cpp b/lld/MachO/Driver.cpp index ab4abb1fa97efc..59c24a06a2cb20 100644 --- a/lld/MachO/Driver.cpp +++ b/lld/MachO/Driver.cpp @@ -1326,7 +1326,8 @@ static void codegenDataGenerate() { TimeTraceScope timeScope("Generating codegen data"); OutlinedHashTreeRecord globalOutlineRecord; - for (ConcatInputSection *isec : inputSections) + StableFunctionMapRecord globalMergeRecord; + for (ConcatInputSection *isec : inputSections) { if (isec->getSegName() == segment_names::data && isec->getName() == section_names::outlinedHashTree) { // Read outlined hash tree from each section. @@ -1337,10 +1338,25 @@ static void codegenDataGenerate() { // Merge it to the global hash tree. globalOutlineRecord.merge(localOutlineRecord); } +if (isec->getSegName() == segment_names::data && +isec->getName() == section_names::functionmap) { + // Read stable functions from each section. + StableFunctionMapRecord localMergeRecord; + auto *data = isec->data.data(); + localMergeRecord.deserialize(data); + + // Merge it to the global function map. + globalMergeRecord.merge(localMergeRecord); +} + } + + globalMergeRecord.finalize(); CodeGenDataWriter Writer; if (!globalOutlineRecord.empty()) Writer.addRecord(globalOutlineRecord); + if (!globalMergeRecord.empty()) +Writer.addRecord(globalMergeRecord); std::error_code EC; auto fileName = config->codegenDataGeneratePath; diff --git a/lld/MachO/InputSection.h b/lld/MachO/InputSection.h index 7ef0e31066f372..b86520d36cda5b 100644 --- a/lld/MachO/InputSection.h +++ b/lld/MachO/InputSection.h @@ -339,6 +339,7 @@ constexpr const char const_[] = "__const"; constexpr const char lazySymbolPtr[] = "__la_symbol_ptr"; constexpr const char lazyBinding[] = "__lazy_binding"; constexpr const char literals[] = "__literals"; +constexpr const char functionmap[] = "__llvm_merge"; constexpr const char moduleInitFunc[] = "__mod_init_func"; constexpr const char moduleTermFunc[] = "__mod_term_func"; constexpr const char nonLazySymbolPtr[] = "__nl_symbol_ptr"; diff --git a/lld/MachO/LTO.cpp b/lld/MachO/LTO.cpp index 28f5290edb58e3..9bddf9a6445f6d 100644 --- a/lld/MachO/LTO.cpp +++ b/lld/MachO/LTO.cpp @@ -25,6 +25,7 @@ #include "llvm/Support/FileSystem.h" #include "llvm/Support/Path.h" #include "llvm/Support/raw_ostream.h" +#include "llvm/Transforms/IPO.h" #include "llvm/Transforms/ObjCARC.h" using namespace lld; @@ -38,6 +39,8 @@ static std::string getThinLTOOutputFile(StringRef modulePath) { config->thinLTOPrefixReplaceNew); } +extern cl::opt EnableGlobalMergeFunc; + static lto::Config createConfig() { lto::Config c; c.Options = initTargetOptionsFromCodeGenFlags(); @@ -49,6 +52,10 @@ static lto::Config createConfig() { c.MAttrs = getMAttrs(); c.DiagHandler = diagnosticHandler; + c.PreCodeGenPassesHook = [](legacy::PassManager &pm) { +if (EnableGlobalMergeFunc) + pm.add(createGlobalMergeFuncPass()); + }; c.AlwaysEmitRegularLTOObj = !config->ltoObjPath.empty(); c.TimeTraceEnabled = config->timeTraceEnabled; diff --git a/lld/test/MachO/cgdata-generate-merge.s b/lld/test/MachO/cgdata-generate-merge.s new file mode 100644 index 00..3f7fb6777bc3cf --- /dev/null +++ b/lld/test/MachO/cgdata-generate-merge.s @@ -0,0 +1,85 @@ +# UNSUPPORTED: system-windows +# REQUIRES: aarch64 + +# RUN: rm -rf %t; split-file %s %t + +# Synthesize raw cgdata without the header (32 byte) from the indexed cgdata. +# RUN: llvm-cgdata --convert --format binary %t/raw-1.cgtext -o %t/raw-1.cgdata +# RUN: od -t x1 -j 32 -An %t/raw-1.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ /g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-1-bytes.txt +# RUN: sed "s//$(cat %t/raw-1-bytes.txt)/g" %t/merge-template.s > %t/merge-1.s +# RUN: llvm-cgdata --convert --format binary %t/raw-2.cgtext -o %t/raw-2.cgdata +# RUN: od -t x1 -j
[llvm-branch-commits] [llvm] [CGData] Global Merge Functions (PR #112671)
https://github.com/kyulee-com updated https://github.com/llvm/llvm-project/pull/112671 >From ded5771bb4ff7c8fd5401b4efe0af988539a8162 Mon Sep 17 00:00:00 2001 From: Kyungwoo Lee Date: Fri, 30 Aug 2024 00:09:09 -0700 Subject: [PATCH] [CGData] Global Merge Functions --- llvm/include/llvm/CGData/CodeGenData.h| 11 + llvm/include/llvm/InitializePasses.h | 1 + llvm/include/llvm/LinkAllPasses.h | 1 + llvm/include/llvm/Passes/CodeGenPassBuilder.h | 1 + llvm/include/llvm/Transforms/IPO.h| 2 + .../Transforms/IPO/GlobalMergeFunctions.h | 77 ++ llvm/lib/CodeGen/TargetPassConfig.cpp | 3 + llvm/lib/LTO/LTO.cpp | 1 + llvm/lib/Transforms/IPO/CMakeLists.txt| 2 + .../Transforms/IPO/GlobalMergeFunctions.cpp | 687 ++ .../ThinLTO/AArch64/cgdata-merge-local.ll | 62 ++ .../test/ThinLTO/AArch64/cgdata-merge-read.ll | 82 +++ .../AArch64/cgdata-merge-two-rounds.ll| 68 ++ .../ThinLTO/AArch64/cgdata-merge-write.ll | 97 +++ llvm/tools/llvm-lto2/CMakeLists.txt | 1 + llvm/tools/llvm-lto2/llvm-lto2.cpp| 6 + 16 files changed, 1102 insertions(+) create mode 100644 llvm/include/llvm/Transforms/IPO/GlobalMergeFunctions.h create mode 100644 llvm/lib/Transforms/IPO/GlobalMergeFunctions.cpp create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-local.ll create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-read.ll create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-two-rounds.ll create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-write.ll diff --git a/llvm/include/llvm/CGData/CodeGenData.h b/llvm/include/llvm/CGData/CodeGenData.h index 5d7c74725ccef1..da0e412f2a0e03 100644 --- a/llvm/include/llvm/CGData/CodeGenData.h +++ b/llvm/include/llvm/CGData/CodeGenData.h @@ -145,6 +145,9 @@ class CodeGenData { const OutlinedHashTree *getOutlinedHashTree() { return PublishedHashTree.get(); } + const StableFunctionMap *getStableFunctionMap() { +return PublishedStableFunctionMap.get(); + } /// Returns true if we should write codegen data. bool emitCGData() { return EmitCGData; } @@ -169,10 +172,18 @@ inline bool hasOutlinedHashTree() { return CodeGenData::getInstance().hasOutlinedHashTree(); } +inline bool hasStableFunctionMap() { + return CodeGenData::getInstance().hasStableFunctionMap(); +} + inline const OutlinedHashTree *getOutlinedHashTree() { return CodeGenData::getInstance().getOutlinedHashTree(); } +inline const StableFunctionMap *getStableFunctionMap() { + return CodeGenData::getInstance().getStableFunctionMap(); +} + inline bool emitCGData() { return CodeGenData::getInstance().emitCGData(); } inline void diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h index 4352099d6dbb99..9aa36d5bb7f801 100644 --- a/llvm/include/llvm/InitializePasses.h +++ b/llvm/include/llvm/InitializePasses.h @@ -123,6 +123,7 @@ void initializeGCEmptyBasicBlocksPass(PassRegistry &); void initializeGCMachineCodeAnalysisPass(PassRegistry &); void initializeGCModuleInfoPass(PassRegistry &); void initializeGVNLegacyPassPass(PassRegistry &); +void initializeGlobalMergeFuncPass(PassRegistry &); void initializeGlobalMergePass(PassRegistry &); void initializeGlobalsAAWrapperPassPass(PassRegistry &); void initializeHardwareLoopsLegacyPass(PassRegistry &); diff --git a/llvm/include/llvm/LinkAllPasses.h b/llvm/include/llvm/LinkAllPasses.h index 92b59a66567c95..ea3609a2b4bc71 100644 --- a/llvm/include/llvm/LinkAllPasses.h +++ b/llvm/include/llvm/LinkAllPasses.h @@ -79,6 +79,7 @@ struct ForcePassLinking { (void)llvm::createDomOnlyViewerWrapperPassPass(); (void)llvm::createDomViewerWrapperPassPass(); (void)llvm::createAlwaysInlinerLegacyPass(); +(void)llvm::createGlobalMergeFuncPass(); (void)llvm::createGlobalsAAWrapperPass(); (void)llvm::createInstSimplifyLegacyPass(); (void)llvm::createInstructionCombiningPass(); diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h b/llvm/include/llvm/Passes/CodeGenPassBuilder.h index 13bc4700d87029..96b5b815132bc0 100644 --- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h +++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h @@ -74,6 +74,7 @@ #include "llvm/Target/CGPassBuilderOption.h" #include "llvm/Target/TargetMachine.h" #include "llvm/Transforms/CFGuard.h" +#include "llvm/Transforms/IPO/GlobalMergeFunctions.h" #include "llvm/Transforms/Scalar/ConstantHoisting.h" #include "llvm/Transforms/Scalar/LoopPassManager.h" #include "llvm/Transforms/Scalar/LoopStrengthReduce.h" diff --git a/llvm/include/llvm/Transforms/IPO.h b/llvm/include/llvm/Transforms/IPO.h index ee0e35aa618325..86a8654f56997c 100644 --- a/llvm/include/llvm/Transforms/IPO.h +++ b/llvm/include/llvm/Transforms/IPO.h @@ -55,6 +55,8 @@ enum class PassSummaryAction { Export, ///< Export information to summary. }; +Pass *createGlobalMergeFuncP
[llvm-branch-commits] [lld] [llvm] [CGData][llvm-cgdata] Support for stable function map (PR #112664)
https://github.com/kyulee-com updated https://github.com/llvm/llvm-project/pull/112664 >From 09f1ec7730868a53cb566b0913e7952dfc15fa16 Mon Sep 17 00:00:00 2001 From: Kyungwoo Lee Date: Mon, 9 Sep 2024 19:38:05 -0700 Subject: [PATCH] [CGData][llvm-cgdata] Support for stable function map This introduces a new cgdata format for stable function maps. The raw data is embedded in the __llvm_merge section during compile time. This data can be read and merged using the llvm-cgdata tool, into an indexed cgdata file. Consequently, the tool is now capable of handling either outlined hash trees, stable function maps, or both, as they are orthogonal. --- lld/test/MachO/cgdata-generate.s | 6 +- llvm/docs/CommandGuide/llvm-cgdata.rst| 16 ++-- llvm/include/llvm/CGData/CodeGenData.h| 24 +- llvm/include/llvm/CGData/CodeGenData.inc | 12 ++- llvm/include/llvm/CGData/CodeGenDataReader.h | 29 ++- llvm/include/llvm/CGData/CodeGenDataWriter.h | 17 +++- llvm/lib/CGData/CodeGenData.cpp | 30 --- llvm/lib/CGData/CodeGenDataReader.cpp | 63 +- llvm/lib/CGData/CodeGenDataWriter.cpp | 30 ++- llvm/test/tools/llvm-cgdata/empty.test| 8 +- llvm/test/tools/llvm-cgdata/error.test| 13 +-- .../merge-combined-funcmap-hashtree.test | 66 +++ .../llvm-cgdata/merge-funcmap-archive.test| 83 +++ .../llvm-cgdata/merge-funcmap-concat.test | 78 + .../llvm-cgdata/merge-funcmap-double.test | 79 ++ .../llvm-cgdata/merge-funcmap-single.test | 36 ...chive.test => merge-hashtree-archive.test} | 8 +- ...concat.test => merge-hashtree-concat.test} | 6 +- ...double.test => merge-hashtree-double.test} | 8 +- ...single.test => merge-hashtree-single.test} | 4 +- llvm/tools/llvm-cgdata/llvm-cgdata.cpp| 48 --- 21 files changed, 577 insertions(+), 87 deletions(-) create mode 100644 llvm/test/tools/llvm-cgdata/merge-combined-funcmap-hashtree.test create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-archive.test create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-concat.test create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-double.test create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-single.test rename llvm/test/tools/llvm-cgdata/{merge-archive.test => merge-hashtree-archive.test} (91%) rename llvm/test/tools/llvm-cgdata/{merge-concat.test => merge-hashtree-concat.test} (93%) rename llvm/test/tools/llvm-cgdata/{merge-double.test => merge-hashtree-double.test} (90%) rename llvm/test/tools/llvm-cgdata/{merge-single.test => merge-hashtree-single.test} (92%) diff --git a/lld/test/MachO/cgdata-generate.s b/lld/test/MachO/cgdata-generate.s index 174df39d666c5d..f942ae07f64e0e 100644 --- a/lld/test/MachO/cgdata-generate.s +++ b/lld/test/MachO/cgdata-generate.s @@ -3,12 +3,12 @@ # RUN: rm -rf %t; split-file %s %t -# Synthesize raw cgdata without the header (24 byte) from the indexed cgdata. +# Synthesize raw cgdata without the header (32 byte) from the indexed cgdata. # RUN: llvm-cgdata --convert --format binary %t/raw-1.cgtext -o %t/raw-1.cgdata -# RUN: od -t x1 -j 24 -An %t/raw-1.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ /g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-1-bytes.txt +# RUN: od -t x1 -j 32 -An %t/raw-1.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ /g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-1-bytes.txt # RUN: sed "s//$(cat %t/raw-1-bytes.txt)/g" %t/merge-template.s > %t/merge-1.s # RUN: llvm-cgdata --convert --format binary %t/raw-2.cgtext -o %t/raw-2.cgdata -# RUN: od -t x1 -j 24 -An %t/raw-2.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ /g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-2-bytes.txt +# RUN: od -t x1 -j 32 -An %t/raw-2.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ /g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-2-bytes.txt # RUN: sed "s//$(cat %t/raw-2-bytes.txt)/g" %t/merge-template.s > %t/merge-2.s # RUN: llvm-mc -filetype obj -triple arm64-apple-darwin %t/merge-1.s -o %t/merge-1.o diff --git a/llvm/docs/CommandGuide/llvm-cgdata.rst b/llvm/docs/CommandGuide/llvm-cgdata.rst index f592e1508844ee..0670decd087e39 100644 --- a/llvm/docs/CommandGuide/llvm-cgdata.rst +++ b/llvm/docs/CommandGuide/llvm-cgdata.rst @@ -11,15 +11,13 @@ SYNOPSIS DESCRIPTION --- -The :program:llvm-cgdata utility parses raw codegen data embedded -in compiled binary files and merges them into a single .cgdata file. -It can also inspect and manipulate .cgdata files. -Currently, the tool supports saving and restoring outlined hash trees, -enabling global function outlining across modules, allowing for more -efficient function outlining in subsequent compilations. -The design is extensible, allowing for the incorporation of additional -codegen summaries and optimization techniques, such as global function -merging, in the futur
[llvm-branch-commits] [llvm] [CGData] Global Merge Functions (PR #112671)
https://github.com/kyulee-com updated https://github.com/llvm/llvm-project/pull/112671 >From 584a2d7fdadf91838b7b305a1b09056fcb0e805f Mon Sep 17 00:00:00 2001 From: Kyungwoo Lee Date: Fri, 30 Aug 2024 00:09:09 -0700 Subject: [PATCH] [CGData] Global Merge Functions --- llvm/include/llvm/CGData/CodeGenData.h| 11 + llvm/include/llvm/InitializePasses.h | 1 + llvm/include/llvm/LinkAllPasses.h | 1 + llvm/include/llvm/Passes/CodeGenPassBuilder.h | 1 + llvm/include/llvm/Transforms/IPO.h| 2 + .../Transforms/IPO/GlobalMergeFunctions.h | 73 ++ llvm/lib/CodeGen/TargetPassConfig.cpp | 3 + llvm/lib/LTO/LTO.cpp | 1 + llvm/lib/Transforms/IPO/CMakeLists.txt| 2 + .../Transforms/IPO/GlobalMergeFunctions.cpp | 666 ++ .../ThinLTO/AArch64/cgdata-merge-local.ll | 62 ++ .../test/ThinLTO/AArch64/cgdata-merge-read.ll | 82 +++ .../AArch64/cgdata-merge-two-rounds.ll| 68 ++ .../ThinLTO/AArch64/cgdata-merge-write.ll | 97 +++ llvm/tools/llvm-lto2/CMakeLists.txt | 1 + llvm/tools/llvm-lto2/llvm-lto2.cpp| 6 + 16 files changed, 1077 insertions(+) create mode 100644 llvm/include/llvm/Transforms/IPO/GlobalMergeFunctions.h create mode 100644 llvm/lib/Transforms/IPO/GlobalMergeFunctions.cpp create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-local.ll create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-read.ll create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-two-rounds.ll create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-merge-write.ll diff --git a/llvm/include/llvm/CGData/CodeGenData.h b/llvm/include/llvm/CGData/CodeGenData.h index 5d7c74725ccef1..da0e412f2a0e03 100644 --- a/llvm/include/llvm/CGData/CodeGenData.h +++ b/llvm/include/llvm/CGData/CodeGenData.h @@ -145,6 +145,9 @@ class CodeGenData { const OutlinedHashTree *getOutlinedHashTree() { return PublishedHashTree.get(); } + const StableFunctionMap *getStableFunctionMap() { +return PublishedStableFunctionMap.get(); + } /// Returns true if we should write codegen data. bool emitCGData() { return EmitCGData; } @@ -169,10 +172,18 @@ inline bool hasOutlinedHashTree() { return CodeGenData::getInstance().hasOutlinedHashTree(); } +inline bool hasStableFunctionMap() { + return CodeGenData::getInstance().hasStableFunctionMap(); +} + inline const OutlinedHashTree *getOutlinedHashTree() { return CodeGenData::getInstance().getOutlinedHashTree(); } +inline const StableFunctionMap *getStableFunctionMap() { + return CodeGenData::getInstance().getStableFunctionMap(); +} + inline bool emitCGData() { return CodeGenData::getInstance().emitCGData(); } inline void diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h index 4352099d6dbb99..9aa36d5bb7f801 100644 --- a/llvm/include/llvm/InitializePasses.h +++ b/llvm/include/llvm/InitializePasses.h @@ -123,6 +123,7 @@ void initializeGCEmptyBasicBlocksPass(PassRegistry &); void initializeGCMachineCodeAnalysisPass(PassRegistry &); void initializeGCModuleInfoPass(PassRegistry &); void initializeGVNLegacyPassPass(PassRegistry &); +void initializeGlobalMergeFuncPass(PassRegistry &); void initializeGlobalMergePass(PassRegistry &); void initializeGlobalsAAWrapperPassPass(PassRegistry &); void initializeHardwareLoopsLegacyPass(PassRegistry &); diff --git a/llvm/include/llvm/LinkAllPasses.h b/llvm/include/llvm/LinkAllPasses.h index 92b59a66567c95..ea3609a2b4bc71 100644 --- a/llvm/include/llvm/LinkAllPasses.h +++ b/llvm/include/llvm/LinkAllPasses.h @@ -79,6 +79,7 @@ struct ForcePassLinking { (void)llvm::createDomOnlyViewerWrapperPassPass(); (void)llvm::createDomViewerWrapperPassPass(); (void)llvm::createAlwaysInlinerLegacyPass(); +(void)llvm::createGlobalMergeFuncPass(); (void)llvm::createGlobalsAAWrapperPass(); (void)llvm::createInstSimplifyLegacyPass(); (void)llvm::createInstructionCombiningPass(); diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h b/llvm/include/llvm/Passes/CodeGenPassBuilder.h index 13bc4700d87029..96b5b815132bc0 100644 --- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h +++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h @@ -74,6 +74,7 @@ #include "llvm/Target/CGPassBuilderOption.h" #include "llvm/Target/TargetMachine.h" #include "llvm/Transforms/CFGuard.h" +#include "llvm/Transforms/IPO/GlobalMergeFunctions.h" #include "llvm/Transforms/Scalar/ConstantHoisting.h" #include "llvm/Transforms/Scalar/LoopPassManager.h" #include "llvm/Transforms/Scalar/LoopStrengthReduce.h" diff --git a/llvm/include/llvm/Transforms/IPO.h b/llvm/include/llvm/Transforms/IPO.h index ee0e35aa618325..86a8654f56997c 100644 --- a/llvm/include/llvm/Transforms/IPO.h +++ b/llvm/include/llvm/Transforms/IPO.h @@ -55,6 +55,8 @@ enum class PassSummaryAction { Export, ///< Export information to summary. }; +Pass *createGlobalMergeFuncP
[llvm-branch-commits] [lld] [CGData][lld-macho] Add Global Merge Func Pass (PR #112674)
https://github.com/kyulee-com updated https://github.com/llvm/llvm-project/pull/112674 >From ead1aee8eeb4046ec0641c09652cea726becd48a Mon Sep 17 00:00:00 2001 From: Kyungwoo Lee Date: Wed, 16 Oct 2024 22:56:38 -0700 Subject: [PATCH] [CGData][lld-macho] Add Global Merge Func Pass --- lld/MachO/CMakeLists.txt | 2 + lld/MachO/Driver.cpp | 18 +- lld/MachO/InputSection.h | 1 + lld/MachO/LTO.cpp | 7 +++ lld/test/MachO/cgdata-generate-merge.s | 85 ++ 5 files changed, 112 insertions(+), 1 deletion(-) create mode 100644 lld/test/MachO/cgdata-generate-merge.s diff --git a/lld/MachO/CMakeLists.txt b/lld/MachO/CMakeLists.txt index ecf6ce609e59f2..137fe4939b4457 100644 --- a/lld/MachO/CMakeLists.txt +++ b/lld/MachO/CMakeLists.txt @@ -41,9 +41,11 @@ add_lld_library(lldMachO BitReader BitWriter CGData + CodeGen Core DebugInfoDWARF Demangle + IPO LTO MC ObjCARCOpts diff --git a/lld/MachO/Driver.cpp b/lld/MachO/Driver.cpp index ab4abb1fa97efc..59c24a06a2cb20 100644 --- a/lld/MachO/Driver.cpp +++ b/lld/MachO/Driver.cpp @@ -1326,7 +1326,8 @@ static void codegenDataGenerate() { TimeTraceScope timeScope("Generating codegen data"); OutlinedHashTreeRecord globalOutlineRecord; - for (ConcatInputSection *isec : inputSections) + StableFunctionMapRecord globalMergeRecord; + for (ConcatInputSection *isec : inputSections) { if (isec->getSegName() == segment_names::data && isec->getName() == section_names::outlinedHashTree) { // Read outlined hash tree from each section. @@ -1337,10 +1338,25 @@ static void codegenDataGenerate() { // Merge it to the global hash tree. globalOutlineRecord.merge(localOutlineRecord); } +if (isec->getSegName() == segment_names::data && +isec->getName() == section_names::functionmap) { + // Read stable functions from each section. + StableFunctionMapRecord localMergeRecord; + auto *data = isec->data.data(); + localMergeRecord.deserialize(data); + + // Merge it to the global function map. + globalMergeRecord.merge(localMergeRecord); +} + } + + globalMergeRecord.finalize(); CodeGenDataWriter Writer; if (!globalOutlineRecord.empty()) Writer.addRecord(globalOutlineRecord); + if (!globalMergeRecord.empty()) +Writer.addRecord(globalMergeRecord); std::error_code EC; auto fileName = config->codegenDataGeneratePath; diff --git a/lld/MachO/InputSection.h b/lld/MachO/InputSection.h index 7ef0e31066f372..b86520d36cda5b 100644 --- a/lld/MachO/InputSection.h +++ b/lld/MachO/InputSection.h @@ -339,6 +339,7 @@ constexpr const char const_[] = "__const"; constexpr const char lazySymbolPtr[] = "__la_symbol_ptr"; constexpr const char lazyBinding[] = "__lazy_binding"; constexpr const char literals[] = "__literals"; +constexpr const char functionmap[] = "__llvm_merge"; constexpr const char moduleInitFunc[] = "__mod_init_func"; constexpr const char moduleTermFunc[] = "__mod_term_func"; constexpr const char nonLazySymbolPtr[] = "__nl_symbol_ptr"; diff --git a/lld/MachO/LTO.cpp b/lld/MachO/LTO.cpp index 28f5290edb58e3..9bddf9a6445f6d 100644 --- a/lld/MachO/LTO.cpp +++ b/lld/MachO/LTO.cpp @@ -25,6 +25,7 @@ #include "llvm/Support/FileSystem.h" #include "llvm/Support/Path.h" #include "llvm/Support/raw_ostream.h" +#include "llvm/Transforms/IPO.h" #include "llvm/Transforms/ObjCARC.h" using namespace lld; @@ -38,6 +39,8 @@ static std::string getThinLTOOutputFile(StringRef modulePath) { config->thinLTOPrefixReplaceNew); } +extern cl::opt EnableGlobalMergeFunc; + static lto::Config createConfig() { lto::Config c; c.Options = initTargetOptionsFromCodeGenFlags(); @@ -49,6 +52,10 @@ static lto::Config createConfig() { c.MAttrs = getMAttrs(); c.DiagHandler = diagnosticHandler; + c.PreCodeGenPassesHook = [](legacy::PassManager &pm) { +if (EnableGlobalMergeFunc) + pm.add(createGlobalMergeFuncPass()); + }; c.AlwaysEmitRegularLTOObj = !config->ltoObjPath.empty(); c.TimeTraceEnabled = config->timeTraceEnabled; diff --git a/lld/test/MachO/cgdata-generate-merge.s b/lld/test/MachO/cgdata-generate-merge.s new file mode 100644 index 00..3f7fb6777bc3cf --- /dev/null +++ b/lld/test/MachO/cgdata-generate-merge.s @@ -0,0 +1,85 @@ +# UNSUPPORTED: system-windows +# REQUIRES: aarch64 + +# RUN: rm -rf %t; split-file %s %t + +# Synthesize raw cgdata without the header (32 byte) from the indexed cgdata. +# RUN: llvm-cgdata --convert --format binary %t/raw-1.cgtext -o %t/raw-1.cgdata +# RUN: od -t x1 -j 32 -An %t/raw-1.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ /g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-1-bytes.txt +# RUN: sed "s//$(cat %t/raw-1-bytes.txt)/g" %t/merge-template.s > %t/merge-1.s +# RUN: llvm-cgdata --convert --format binary %t/raw-2.cgtext -o %t/raw-2.cgdata +# RUN: od -t x1 -j
[llvm-branch-commits] [lld] [CGData][lld-macho] Add Global Merge Func Pass (PR #112674)
https://github.com/kyulee-com updated https://github.com/llvm/llvm-project/pull/112674 >From 549cf5d3880450641c720a6bc1f3bddae272f902 Mon Sep 17 00:00:00 2001 From: Kyungwoo Lee Date: Wed, 16 Oct 2024 22:56:38 -0700 Subject: [PATCH] [CGData][lld-macho] Add Global Merge Func Pass --- lld/MachO/CMakeLists.txt | 2 + lld/MachO/Driver.cpp | 18 +- lld/MachO/InputSection.h | 1 + lld/MachO/LTO.cpp | 7 +++ lld/test/MachO/cgdata-generate-merge.s | 85 ++ 5 files changed, 112 insertions(+), 1 deletion(-) create mode 100644 lld/test/MachO/cgdata-generate-merge.s diff --git a/lld/MachO/CMakeLists.txt b/lld/MachO/CMakeLists.txt index ecf6ce609e59f2..137fe4939b4457 100644 --- a/lld/MachO/CMakeLists.txt +++ b/lld/MachO/CMakeLists.txt @@ -41,9 +41,11 @@ add_lld_library(lldMachO BitReader BitWriter CGData + CodeGen Core DebugInfoDWARF Demangle + IPO LTO MC ObjCARCOpts diff --git a/lld/MachO/Driver.cpp b/lld/MachO/Driver.cpp index ab4abb1fa97efc..59c24a06a2cb20 100644 --- a/lld/MachO/Driver.cpp +++ b/lld/MachO/Driver.cpp @@ -1326,7 +1326,8 @@ static void codegenDataGenerate() { TimeTraceScope timeScope("Generating codegen data"); OutlinedHashTreeRecord globalOutlineRecord; - for (ConcatInputSection *isec : inputSections) + StableFunctionMapRecord globalMergeRecord; + for (ConcatInputSection *isec : inputSections) { if (isec->getSegName() == segment_names::data && isec->getName() == section_names::outlinedHashTree) { // Read outlined hash tree from each section. @@ -1337,10 +1338,25 @@ static void codegenDataGenerate() { // Merge it to the global hash tree. globalOutlineRecord.merge(localOutlineRecord); } +if (isec->getSegName() == segment_names::data && +isec->getName() == section_names::functionmap) { + // Read stable functions from each section. + StableFunctionMapRecord localMergeRecord; + auto *data = isec->data.data(); + localMergeRecord.deserialize(data); + + // Merge it to the global function map. + globalMergeRecord.merge(localMergeRecord); +} + } + + globalMergeRecord.finalize(); CodeGenDataWriter Writer; if (!globalOutlineRecord.empty()) Writer.addRecord(globalOutlineRecord); + if (!globalMergeRecord.empty()) +Writer.addRecord(globalMergeRecord); std::error_code EC; auto fileName = config->codegenDataGeneratePath; diff --git a/lld/MachO/InputSection.h b/lld/MachO/InputSection.h index 7ef0e31066f372..b86520d36cda5b 100644 --- a/lld/MachO/InputSection.h +++ b/lld/MachO/InputSection.h @@ -339,6 +339,7 @@ constexpr const char const_[] = "__const"; constexpr const char lazySymbolPtr[] = "__la_symbol_ptr"; constexpr const char lazyBinding[] = "__lazy_binding"; constexpr const char literals[] = "__literals"; +constexpr const char functionmap[] = "__llvm_merge"; constexpr const char moduleInitFunc[] = "__mod_init_func"; constexpr const char moduleTermFunc[] = "__mod_term_func"; constexpr const char nonLazySymbolPtr[] = "__nl_symbol_ptr"; diff --git a/lld/MachO/LTO.cpp b/lld/MachO/LTO.cpp index 28f5290edb58e3..9bddf9a6445f6d 100644 --- a/lld/MachO/LTO.cpp +++ b/lld/MachO/LTO.cpp @@ -25,6 +25,7 @@ #include "llvm/Support/FileSystem.h" #include "llvm/Support/Path.h" #include "llvm/Support/raw_ostream.h" +#include "llvm/Transforms/IPO.h" #include "llvm/Transforms/ObjCARC.h" using namespace lld; @@ -38,6 +39,8 @@ static std::string getThinLTOOutputFile(StringRef modulePath) { config->thinLTOPrefixReplaceNew); } +extern cl::opt EnableGlobalMergeFunc; + static lto::Config createConfig() { lto::Config c; c.Options = initTargetOptionsFromCodeGenFlags(); @@ -49,6 +52,10 @@ static lto::Config createConfig() { c.MAttrs = getMAttrs(); c.DiagHandler = diagnosticHandler; + c.PreCodeGenPassesHook = [](legacy::PassManager &pm) { +if (EnableGlobalMergeFunc) + pm.add(createGlobalMergeFuncPass()); + }; c.AlwaysEmitRegularLTOObj = !config->ltoObjPath.empty(); c.TimeTraceEnabled = config->timeTraceEnabled; diff --git a/lld/test/MachO/cgdata-generate-merge.s b/lld/test/MachO/cgdata-generate-merge.s new file mode 100644 index 00..3f7fb6777bc3cf --- /dev/null +++ b/lld/test/MachO/cgdata-generate-merge.s @@ -0,0 +1,85 @@ +# UNSUPPORTED: system-windows +# REQUIRES: aarch64 + +# RUN: rm -rf %t; split-file %s %t + +# Synthesize raw cgdata without the header (32 byte) from the indexed cgdata. +# RUN: llvm-cgdata --convert --format binary %t/raw-1.cgtext -o %t/raw-1.cgdata +# RUN: od -t x1 -j 32 -An %t/raw-1.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ /g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-1-bytes.txt +# RUN: sed "s//$(cat %t/raw-1-bytes.txt)/g" %t/merge-template.s > %t/merge-1.s +# RUN: llvm-cgdata --convert --format binary %t/raw-2.cgtext -o %t/raw-2.cgdata +# RUN: od -t x1 -j
[llvm-branch-commits] [llvm] [CGData] Global Merge Functions (PR #112671)
https://github.com/kyulee-com edited https://github.com/llvm/llvm-project/pull/112671 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)
https://github.com/kyulee-com edited https://github.com/llvm/llvm-project/pull/112638 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld] [CGData][lld-macho] Add Global Merge Func Pass (PR #112674)
https://github.com/kyulee-com edited https://github.com/llvm/llvm-project/pull/112674 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld] [llvm] [CGData][llvm-cgdata] Support for stable function map (PR #112664)
https://github.com/kyulee-com edited https://github.com/llvm/llvm-project/pull/112664 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [StructuralHash] Support Differences (PR #112638)
https://github.com/kyulee-com edited https://github.com/llvm/llvm-project/pull/112638 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [CGData] Stable Function Map (PR #112662)
https://github.com/kyulee-com edited https://github.com/llvm/llvm-project/pull/112662 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld] [llvm] [CGData][llvm-cgdata] Support for stable function map (PR #112664)
https://github.com/kyulee-com updated https://github.com/llvm/llvm-project/pull/112664 >From c7913f9fff736da4cc6a78a17e41dc539bc75e8a Mon Sep 17 00:00:00 2001 From: Kyungwoo Lee Date: Mon, 9 Sep 2024 19:38:05 -0700 Subject: [PATCH 1/2] [CGData][llvm-cgdata] Support for stable function map This introduces a new cgdata format for stable function maps. The raw data is embedded in the __llvm_merge section during compile time. This data can be read and merged using the llvm-cgdata tool, into an indexed cgdata file. Consequently, the tool is now capable of handling either outlined hash trees, stable function maps, or both, as they are orthogonal. --- lld/test/MachO/cgdata-generate.s | 6 +- llvm/docs/CommandGuide/llvm-cgdata.rst| 16 ++-- llvm/include/llvm/CGData/CodeGenData.h| 24 +- llvm/include/llvm/CGData/CodeGenData.inc | 12 ++- llvm/include/llvm/CGData/CodeGenDataReader.h | 29 ++- llvm/include/llvm/CGData/CodeGenDataWriter.h | 17 +++- llvm/lib/CGData/CodeGenData.cpp | 30 --- llvm/lib/CGData/CodeGenDataReader.cpp | 63 +- llvm/lib/CGData/CodeGenDataWriter.cpp | 30 ++- llvm/test/tools/llvm-cgdata/empty.test| 8 +- llvm/test/tools/llvm-cgdata/error.test| 13 +-- .../merge-combined-funcmap-hashtree.test | 66 +++ .../llvm-cgdata/merge-funcmap-archive.test| 83 +++ .../llvm-cgdata/merge-funcmap-concat.test | 78 + .../llvm-cgdata/merge-funcmap-double.test | 79 ++ .../llvm-cgdata/merge-funcmap-single.test | 36 ...chive.test => merge-hashtree-archive.test} | 8 +- ...concat.test => merge-hashtree-concat.test} | 6 +- ...double.test => merge-hashtree-double.test} | 8 +- ...single.test => merge-hashtree-single.test} | 4 +- llvm/tools/llvm-cgdata/llvm-cgdata.cpp| 48 --- 21 files changed, 577 insertions(+), 87 deletions(-) create mode 100644 llvm/test/tools/llvm-cgdata/merge-combined-funcmap-hashtree.test create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-archive.test create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-concat.test create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-double.test create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-single.test rename llvm/test/tools/llvm-cgdata/{merge-archive.test => merge-hashtree-archive.test} (91%) rename llvm/test/tools/llvm-cgdata/{merge-concat.test => merge-hashtree-concat.test} (93%) rename llvm/test/tools/llvm-cgdata/{merge-double.test => merge-hashtree-double.test} (90%) rename llvm/test/tools/llvm-cgdata/{merge-single.test => merge-hashtree-single.test} (92%) diff --git a/lld/test/MachO/cgdata-generate.s b/lld/test/MachO/cgdata-generate.s index 174df39d666c5d..f942ae07f64e0e 100644 --- a/lld/test/MachO/cgdata-generate.s +++ b/lld/test/MachO/cgdata-generate.s @@ -3,12 +3,12 @@ # RUN: rm -rf %t; split-file %s %t -# Synthesize raw cgdata without the header (24 byte) from the indexed cgdata. +# Synthesize raw cgdata without the header (32 byte) from the indexed cgdata. # RUN: llvm-cgdata --convert --format binary %t/raw-1.cgtext -o %t/raw-1.cgdata -# RUN: od -t x1 -j 24 -An %t/raw-1.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ /g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-1-bytes.txt +# RUN: od -t x1 -j 32 -An %t/raw-1.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ /g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-1-bytes.txt # RUN: sed "s//$(cat %t/raw-1-bytes.txt)/g" %t/merge-template.s > %t/merge-1.s # RUN: llvm-cgdata --convert --format binary %t/raw-2.cgtext -o %t/raw-2.cgdata -# RUN: od -t x1 -j 24 -An %t/raw-2.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ /g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-2-bytes.txt +# RUN: od -t x1 -j 32 -An %t/raw-2.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ /g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-2-bytes.txt # RUN: sed "s//$(cat %t/raw-2-bytes.txt)/g" %t/merge-template.s > %t/merge-2.s # RUN: llvm-mc -filetype obj -triple arm64-apple-darwin %t/merge-1.s -o %t/merge-1.o diff --git a/llvm/docs/CommandGuide/llvm-cgdata.rst b/llvm/docs/CommandGuide/llvm-cgdata.rst index f592e1508844ee..0670decd087e39 100644 --- a/llvm/docs/CommandGuide/llvm-cgdata.rst +++ b/llvm/docs/CommandGuide/llvm-cgdata.rst @@ -11,15 +11,13 @@ SYNOPSIS DESCRIPTION --- -The :program:llvm-cgdata utility parses raw codegen data embedded -in compiled binary files and merges them into a single .cgdata file. -It can also inspect and manipulate .cgdata files. -Currently, the tool supports saving and restoring outlined hash trees, -enabling global function outlining across modules, allowing for more -efficient function outlining in subsequent compilations. -The design is extensible, allowing for the incorporation of additional -codegen summaries and optimization techniques, such as global function -merging, in the f
[llvm-branch-commits] [lld] [llvm] [CGData][llvm-cgdata] Support for stable function map (PR #112664)
https://github.com/kyulee-com ready_for_review https://github.com/llvm/llvm-project/pull/112664 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld] [llvm] [CGData][llvm-cgdata] Support for stable function map (PR #112664)
https://github.com/kyulee-com edited https://github.com/llvm/llvm-project/pull/112664 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld] [llvm] [CGData][llvm-cgdata] Support for stable function map (PR #112664)
https://github.com/kyulee-com updated https://github.com/llvm/llvm-project/pull/112664 >From c7913f9fff736da4cc6a78a17e41dc539bc75e8a Mon Sep 17 00:00:00 2001 From: Kyungwoo Lee Date: Mon, 9 Sep 2024 19:38:05 -0700 Subject: [PATCH] [CGData][llvm-cgdata] Support for stable function map This introduces a new cgdata format for stable function maps. The raw data is embedded in the __llvm_merge section during compile time. This data can be read and merged using the llvm-cgdata tool, into an indexed cgdata file. Consequently, the tool is now capable of handling either outlined hash trees, stable function maps, or both, as they are orthogonal. --- lld/test/MachO/cgdata-generate.s | 6 +- llvm/docs/CommandGuide/llvm-cgdata.rst| 16 ++-- llvm/include/llvm/CGData/CodeGenData.h| 24 +- llvm/include/llvm/CGData/CodeGenData.inc | 12 ++- llvm/include/llvm/CGData/CodeGenDataReader.h | 29 ++- llvm/include/llvm/CGData/CodeGenDataWriter.h | 17 +++- llvm/lib/CGData/CodeGenData.cpp | 30 --- llvm/lib/CGData/CodeGenDataReader.cpp | 63 +- llvm/lib/CGData/CodeGenDataWriter.cpp | 30 ++- llvm/test/tools/llvm-cgdata/empty.test| 8 +- llvm/test/tools/llvm-cgdata/error.test| 13 +-- .../merge-combined-funcmap-hashtree.test | 66 +++ .../llvm-cgdata/merge-funcmap-archive.test| 83 +++ .../llvm-cgdata/merge-funcmap-concat.test | 78 + .../llvm-cgdata/merge-funcmap-double.test | 79 ++ .../llvm-cgdata/merge-funcmap-single.test | 36 ...chive.test => merge-hashtree-archive.test} | 8 +- ...concat.test => merge-hashtree-concat.test} | 6 +- ...double.test => merge-hashtree-double.test} | 8 +- ...single.test => merge-hashtree-single.test} | 4 +- llvm/tools/llvm-cgdata/llvm-cgdata.cpp| 48 --- 21 files changed, 577 insertions(+), 87 deletions(-) create mode 100644 llvm/test/tools/llvm-cgdata/merge-combined-funcmap-hashtree.test create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-archive.test create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-concat.test create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-double.test create mode 100644 llvm/test/tools/llvm-cgdata/merge-funcmap-single.test rename llvm/test/tools/llvm-cgdata/{merge-archive.test => merge-hashtree-archive.test} (91%) rename llvm/test/tools/llvm-cgdata/{merge-concat.test => merge-hashtree-concat.test} (93%) rename llvm/test/tools/llvm-cgdata/{merge-double.test => merge-hashtree-double.test} (90%) rename llvm/test/tools/llvm-cgdata/{merge-single.test => merge-hashtree-single.test} (92%) diff --git a/lld/test/MachO/cgdata-generate.s b/lld/test/MachO/cgdata-generate.s index 174df39d666c5d..f942ae07f64e0e 100644 --- a/lld/test/MachO/cgdata-generate.s +++ b/lld/test/MachO/cgdata-generate.s @@ -3,12 +3,12 @@ # RUN: rm -rf %t; split-file %s %t -# Synthesize raw cgdata without the header (24 byte) from the indexed cgdata. +# Synthesize raw cgdata without the header (32 byte) from the indexed cgdata. # RUN: llvm-cgdata --convert --format binary %t/raw-1.cgtext -o %t/raw-1.cgdata -# RUN: od -t x1 -j 24 -An %t/raw-1.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ /g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-1-bytes.txt +# RUN: od -t x1 -j 32 -An %t/raw-1.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ /g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-1-bytes.txt # RUN: sed "s//$(cat %t/raw-1-bytes.txt)/g" %t/merge-template.s > %t/merge-1.s # RUN: llvm-cgdata --convert --format binary %t/raw-2.cgtext -o %t/raw-2.cgdata -# RUN: od -t x1 -j 24 -An %t/raw-2.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ /g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-2-bytes.txt +# RUN: od -t x1 -j 32 -An %t/raw-2.cgdata | tr -d '\n\r\t' | sed 's/[ ][ ]*/ /g; s/^[ ]*//; s/[ ]*$//; s/[ ]/,0x/g; s/^/0x/' > %t/raw-2-bytes.txt # RUN: sed "s//$(cat %t/raw-2-bytes.txt)/g" %t/merge-template.s > %t/merge-2.s # RUN: llvm-mc -filetype obj -triple arm64-apple-darwin %t/merge-1.s -o %t/merge-1.o diff --git a/llvm/docs/CommandGuide/llvm-cgdata.rst b/llvm/docs/CommandGuide/llvm-cgdata.rst index f592e1508844ee..0670decd087e39 100644 --- a/llvm/docs/CommandGuide/llvm-cgdata.rst +++ b/llvm/docs/CommandGuide/llvm-cgdata.rst @@ -11,15 +11,13 @@ SYNOPSIS DESCRIPTION --- -The :program:llvm-cgdata utility parses raw codegen data embedded -in compiled binary files and merges them into a single .cgdata file. -It can also inspect and manipulate .cgdata files. -Currently, the tool supports saving and restoring outlined hash trees, -enabling global function outlining across modules, allowing for more -efficient function outlining in subsequent compilations. -The design is extensible, allowing for the incorporation of additional -codegen summaries and optimization techniques, such as global function -merging, in the futur
[llvm-branch-commits] [lld] [llvm] [CGData][llvm-cgdata] Support for stable function map (PR #112664)
kyulee-com wrote: cc. @nocchijiang https://github.com/llvm/llvm-project/pull/112664 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [CGData] Refactor Global Merge Functions (PR #115750)
https://github.com/kyulee-com created https://github.com/llvm/llvm-project/pull/115750 None >From 70dcb2ccba98b392c3539f349ccf7fec284a674c Mon Sep 17 00:00:00 2001 From: Kyungwoo Lee Date: Mon, 11 Nov 2024 10:06:56 -0800 Subject: [PATCH] [CGData] Refactor Global Merge Functions --- llvm/lib/CodeGen/GlobalMergeFunctions.cpp | 148 +- 1 file changed, 59 insertions(+), 89 deletions(-) diff --git a/llvm/lib/CodeGen/GlobalMergeFunctions.cpp b/llvm/lib/CodeGen/GlobalMergeFunctions.cpp index 2b367ca87d9008..df8dbb8a73b95d 100644 --- a/llvm/lib/CodeGen/GlobalMergeFunctions.cpp +++ b/llvm/lib/CodeGen/GlobalMergeFunctions.cpp @@ -31,14 +31,6 @@ static cl::opt DisableCGDataForMerging( "merging is still enabled within a module."), cl::init(false)); -STATISTIC(NumMismatchedFunctionHash, - "Number of mismatched function hash for global merge function"); -STATISTIC(NumMismatchedInstCount, - "Number of mismatched instruction count for global merge function"); -STATISTIC(NumMismatchedConstHash, - "Number of mismatched const hash for global merge function"); -STATISTIC(NumMismatchedModuleId, - "Number of mismatched Module Id for global merge function"); STATISTIC(NumMergedFunctions, "Number of functions that are actually merged using function hash"); STATISTIC(NumAnalyzedModues, "Number of modules that are analyzed"); @@ -203,9 +195,9 @@ void GlobalMergeFunc::analyze(Module &M) { struct FuncMergeInfo { StableFunctionMap::StableFunctionEntry *SF; Function *F; - std::unique_ptr IndexInstruction; + IndexInstrMap *IndexInstruction; FuncMergeInfo(StableFunctionMap::StableFunctionEntry *SF, Function *F, -std::unique_ptr IndexInstruction) +IndexInstrMap *IndexInstruction) : SF(SF), F(F), IndexInstruction(std::move(IndexInstruction)) {} }; @@ -420,101 +412,79 @@ static ParamLocsVecTy computeParamInfo( bool GlobalMergeFunc::merge(Module &M, const StableFunctionMap *FunctionMap) { bool Changed = false; - // Build a map from stable function name to function. - StringMap StableNameToFuncMap; - for (auto &F : M) -StableNameToFuncMap[get_stable_name(F.getName())] = &F; - // Track merged functions - DenseSet MergedFunctions; - - auto ModId = M.getModuleIdentifier(); - for (auto &[Hash, SFS] : FunctionMap->getFunctionMap()) { -// Parameter locations based on the unique hash sequences -// across the candidates. + // Collect stable functions related to the current module. + DenseMap> HashToFuncs; + DenseMap FuncToFI; + auto &Maps = FunctionMap->getFunctionMap(); + for (auto &F : M) { +if (!isEligibleFunction(&F)) + continue; +auto FI = llvm::StructuralHashWithDifferences(F, ignoreOp); +if (Maps.contains(FI.FunctionHash)) { + HashToFuncs[FI.FunctionHash].push_back(&F); + FuncToFI.try_emplace(&F, std::move(FI)); +} + } + + for (auto &[Hash, Funcs] : HashToFuncs) { std::optional ParamLocsVec; -Function *MergedFunc = nullptr; -std::string MergedModId; SmallVector FuncMergeInfos; -for (auto &SF : SFS) { - // Get the function from the stable name. - auto I = StableNameToFuncMap.find( - *FunctionMap->getNameForId(SF->FunctionNameId)); - if (I == StableNameToFuncMap.end()) -continue; - Function *F = I->second; - assert(F); - // Skip if the function has been merged before. - if (MergedFunctions.count(F)) -continue; - // Consider the function if it is eligible for merging. - if (!isEligibleFunction(F)) -continue; - auto FI = llvm::StructuralHashWithDifferences(*F, ignoreOp); - uint64_t FuncHash = FI.FunctionHash; - if (Hash != FuncHash) { -++NumMismatchedFunctionHash; -continue; - } +// Iterate functions with the same hash. +for (auto &F : Funcs) { + auto &SFS = Maps.at(Hash); + auto &FI = FuncToFI.at(F); - if (SF->InstCount != FI.IndexInstruction->size()) { -++NumMismatchedInstCount; + // Check if the function is compatible with any stable function + // in terms of the number of instructions and ignored operands. + assert(!SFS.empty()); + auto &RFS = SFS[0]; + if (RFS->InstCount != FI.IndexInstruction->size()) continue; - } - bool HasValidSharedConst = true; - for (auto &[Index, Hash] : *SF->IndexOperandHashMap) { -auto [InstIndex, OpndIndex] = Index; -assert(InstIndex < FI.IndexInstruction->size()); -auto *Inst = FI.IndexInstruction->lookup(InstIndex); -if (!ignoreOp(Inst, OpndIndex)) { - HasValidSharedConst = false; - break; -} - } - if (!HasValidSharedConst) { -++NumMismatchedConstHash; -continue; - } - if (!checkConstHashCompatible(*SF->IndexOperandHashMap, -*FI.IndexOperandHashMap)) { -
[llvm-branch-commits] [llvm] [CGData] Refactor Global Merge Functions (PR #115750)
https://github.com/kyulee-com edited https://github.com/llvm/llvm-project/pull/115750 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [CGData] Refactor Global Merge Functions (PR #115750)
https://github.com/kyulee-com updated https://github.com/llvm/llvm-project/pull/115750 >From 70dcb2ccba98b392c3539f349ccf7fec284a674c Mon Sep 17 00:00:00 2001 From: Kyungwoo Lee Date: Mon, 11 Nov 2024 10:06:56 -0800 Subject: [PATCH 1/2] [CGData] Refactor Global Merge Functions --- llvm/lib/CodeGen/GlobalMergeFunctions.cpp | 148 +- 1 file changed, 59 insertions(+), 89 deletions(-) diff --git a/llvm/lib/CodeGen/GlobalMergeFunctions.cpp b/llvm/lib/CodeGen/GlobalMergeFunctions.cpp index 2b367ca87d9008..df8dbb8a73b95d 100644 --- a/llvm/lib/CodeGen/GlobalMergeFunctions.cpp +++ b/llvm/lib/CodeGen/GlobalMergeFunctions.cpp @@ -31,14 +31,6 @@ static cl::opt DisableCGDataForMerging( "merging is still enabled within a module."), cl::init(false)); -STATISTIC(NumMismatchedFunctionHash, - "Number of mismatched function hash for global merge function"); -STATISTIC(NumMismatchedInstCount, - "Number of mismatched instruction count for global merge function"); -STATISTIC(NumMismatchedConstHash, - "Number of mismatched const hash for global merge function"); -STATISTIC(NumMismatchedModuleId, - "Number of mismatched Module Id for global merge function"); STATISTIC(NumMergedFunctions, "Number of functions that are actually merged using function hash"); STATISTIC(NumAnalyzedModues, "Number of modules that are analyzed"); @@ -203,9 +195,9 @@ void GlobalMergeFunc::analyze(Module &M) { struct FuncMergeInfo { StableFunctionMap::StableFunctionEntry *SF; Function *F; - std::unique_ptr IndexInstruction; + IndexInstrMap *IndexInstruction; FuncMergeInfo(StableFunctionMap::StableFunctionEntry *SF, Function *F, -std::unique_ptr IndexInstruction) +IndexInstrMap *IndexInstruction) : SF(SF), F(F), IndexInstruction(std::move(IndexInstruction)) {} }; @@ -420,101 +412,79 @@ static ParamLocsVecTy computeParamInfo( bool GlobalMergeFunc::merge(Module &M, const StableFunctionMap *FunctionMap) { bool Changed = false; - // Build a map from stable function name to function. - StringMap StableNameToFuncMap; - for (auto &F : M) -StableNameToFuncMap[get_stable_name(F.getName())] = &F; - // Track merged functions - DenseSet MergedFunctions; - - auto ModId = M.getModuleIdentifier(); - for (auto &[Hash, SFS] : FunctionMap->getFunctionMap()) { -// Parameter locations based on the unique hash sequences -// across the candidates. + // Collect stable functions related to the current module. + DenseMap> HashToFuncs; + DenseMap FuncToFI; + auto &Maps = FunctionMap->getFunctionMap(); + for (auto &F : M) { +if (!isEligibleFunction(&F)) + continue; +auto FI = llvm::StructuralHashWithDifferences(F, ignoreOp); +if (Maps.contains(FI.FunctionHash)) { + HashToFuncs[FI.FunctionHash].push_back(&F); + FuncToFI.try_emplace(&F, std::move(FI)); +} + } + + for (auto &[Hash, Funcs] : HashToFuncs) { std::optional ParamLocsVec; -Function *MergedFunc = nullptr; -std::string MergedModId; SmallVector FuncMergeInfos; -for (auto &SF : SFS) { - // Get the function from the stable name. - auto I = StableNameToFuncMap.find( - *FunctionMap->getNameForId(SF->FunctionNameId)); - if (I == StableNameToFuncMap.end()) -continue; - Function *F = I->second; - assert(F); - // Skip if the function has been merged before. - if (MergedFunctions.count(F)) -continue; - // Consider the function if it is eligible for merging. - if (!isEligibleFunction(F)) -continue; - auto FI = llvm::StructuralHashWithDifferences(*F, ignoreOp); - uint64_t FuncHash = FI.FunctionHash; - if (Hash != FuncHash) { -++NumMismatchedFunctionHash; -continue; - } +// Iterate functions with the same hash. +for (auto &F : Funcs) { + auto &SFS = Maps.at(Hash); + auto &FI = FuncToFI.at(F); - if (SF->InstCount != FI.IndexInstruction->size()) { -++NumMismatchedInstCount; + // Check if the function is compatible with any stable function + // in terms of the number of instructions and ignored operands. + assert(!SFS.empty()); + auto &RFS = SFS[0]; + if (RFS->InstCount != FI.IndexInstruction->size()) continue; - } - bool HasValidSharedConst = true; - for (auto &[Index, Hash] : *SF->IndexOperandHashMap) { -auto [InstIndex, OpndIndex] = Index; -assert(InstIndex < FI.IndexInstruction->size()); -auto *Inst = FI.IndexInstruction->lookup(InstIndex); -if (!ignoreOp(Inst, OpndIndex)) { - HasValidSharedConst = false; - break; -} - } - if (!HasValidSharedConst) { -++NumMismatchedConstHash; -continue; - } - if (!checkConstHashCompatible(*SF->IndexOperandHashMap, -*FI.IndexOperandHashMap)) { -
[llvm-branch-commits] [llvm] [CGData] Refactor Global Merge Functions (PR #115750)
https://github.com/kyulee-com edited https://github.com/llvm/llvm-project/pull/115750 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [CGData] Refactor Global Merge Functions (PR #115750)
https://github.com/kyulee-com ready_for_review https://github.com/llvm/llvm-project/pull/115750 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [CGData] Refactor Global Merge Functions (PR #115750)
@@ -420,101 +412,79 @@ static ParamLocsVecTy computeParamInfo( bool GlobalMergeFunc::merge(Module &M, const StableFunctionMap *FunctionMap) { bool Changed = false; - // Build a map from stable function name to function. - StringMap StableNameToFuncMap; - for (auto &F : M) -StableNameToFuncMap[get_stable_name(F.getName())] = &F; - // Track merged functions - DenseSet MergedFunctions; - - auto ModId = M.getModuleIdentifier(); - for (auto &[Hash, SFS] : FunctionMap->getFunctionMap()) { -// Parameter locations based on the unique hash sequences -// across the candidates. + // Collect stable functions related to the current module. + DenseMap> HashToFuncs; + DenseMap FuncToFI; + auto &Maps = FunctionMap->getFunctionMap(); + for (auto &F : M) { +if (!isEligibleFunction(&F)) + continue; +auto FI = llvm::StructuralHashWithDifferences(F, ignoreOp); +if (Maps.contains(FI.FunctionHash)) { + HashToFuncs[FI.FunctionHash].push_back(&F); + FuncToFI.try_emplace(&F, std::move(FI)); +} + } + + for (auto &[Hash, Funcs] : HashToFuncs) { std::optional ParamLocsVec; -Function *MergedFunc = nullptr; -std::string MergedModId; SmallVector FuncMergeInfos; -for (auto &SF : SFS) { - // Get the function from the stable name. - auto I = StableNameToFuncMap.find( - *FunctionMap->getNameForId(SF->FunctionNameId)); - if (I == StableNameToFuncMap.end()) -continue; - Function *F = I->second; - assert(F); - // Skip if the function has been merged before. - if (MergedFunctions.count(F)) -continue; - // Consider the function if it is eligible for merging. - if (!isEligibleFunction(F)) -continue; - auto FI = llvm::StructuralHashWithDifferences(*F, ignoreOp); - uint64_t FuncHash = FI.FunctionHash; - if (Hash != FuncHash) { -++NumMismatchedFunctionHash; -continue; - } +// Iterate functions with the same hash. +for (auto &F : Funcs) { + auto &SFS = Maps.at(Hash); + auto &FI = FuncToFI.at(F); - if (SF->InstCount != FI.IndexInstruction->size()) { -++NumMismatchedInstCount; + // Check if the function is compatible with any stable function + // in terms of the number of instructions and ignored operands. + assert(!SFS.empty()); + auto &RFS = SFS[0]; kyulee-com wrote: If the codegen data is not stale —meaning there has been no source change in the first or second pass— the size regression is less of a concern. My main worry was about the stability of optimization when the codegen data becomes outdated. https://github.com/llvm/llvm-project/pull/115750 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [CGData] Refactor Global Merge Functions (PR #115750)
https://github.com/kyulee-com updated https://github.com/llvm/llvm-project/pull/115750 >From 70dcb2ccba98b392c3539f349ccf7fec284a674c Mon Sep 17 00:00:00 2001 From: Kyungwoo Lee Date: Mon, 11 Nov 2024 10:06:56 -0800 Subject: [PATCH 1/2] [CGData] Refactor Global Merge Functions --- llvm/lib/CodeGen/GlobalMergeFunctions.cpp | 148 +- 1 file changed, 59 insertions(+), 89 deletions(-) diff --git a/llvm/lib/CodeGen/GlobalMergeFunctions.cpp b/llvm/lib/CodeGen/GlobalMergeFunctions.cpp index 2b367ca87d9008..df8dbb8a73b95d 100644 --- a/llvm/lib/CodeGen/GlobalMergeFunctions.cpp +++ b/llvm/lib/CodeGen/GlobalMergeFunctions.cpp @@ -31,14 +31,6 @@ static cl::opt DisableCGDataForMerging( "merging is still enabled within a module."), cl::init(false)); -STATISTIC(NumMismatchedFunctionHash, - "Number of mismatched function hash for global merge function"); -STATISTIC(NumMismatchedInstCount, - "Number of mismatched instruction count for global merge function"); -STATISTIC(NumMismatchedConstHash, - "Number of mismatched const hash for global merge function"); -STATISTIC(NumMismatchedModuleId, - "Number of mismatched Module Id for global merge function"); STATISTIC(NumMergedFunctions, "Number of functions that are actually merged using function hash"); STATISTIC(NumAnalyzedModues, "Number of modules that are analyzed"); @@ -203,9 +195,9 @@ void GlobalMergeFunc::analyze(Module &M) { struct FuncMergeInfo { StableFunctionMap::StableFunctionEntry *SF; Function *F; - std::unique_ptr IndexInstruction; + IndexInstrMap *IndexInstruction; FuncMergeInfo(StableFunctionMap::StableFunctionEntry *SF, Function *F, -std::unique_ptr IndexInstruction) +IndexInstrMap *IndexInstruction) : SF(SF), F(F), IndexInstruction(std::move(IndexInstruction)) {} }; @@ -420,101 +412,79 @@ static ParamLocsVecTy computeParamInfo( bool GlobalMergeFunc::merge(Module &M, const StableFunctionMap *FunctionMap) { bool Changed = false; - // Build a map from stable function name to function. - StringMap StableNameToFuncMap; - for (auto &F : M) -StableNameToFuncMap[get_stable_name(F.getName())] = &F; - // Track merged functions - DenseSet MergedFunctions; - - auto ModId = M.getModuleIdentifier(); - for (auto &[Hash, SFS] : FunctionMap->getFunctionMap()) { -// Parameter locations based on the unique hash sequences -// across the candidates. + // Collect stable functions related to the current module. + DenseMap> HashToFuncs; + DenseMap FuncToFI; + auto &Maps = FunctionMap->getFunctionMap(); + for (auto &F : M) { +if (!isEligibleFunction(&F)) + continue; +auto FI = llvm::StructuralHashWithDifferences(F, ignoreOp); +if (Maps.contains(FI.FunctionHash)) { + HashToFuncs[FI.FunctionHash].push_back(&F); + FuncToFI.try_emplace(&F, std::move(FI)); +} + } + + for (auto &[Hash, Funcs] : HashToFuncs) { std::optional ParamLocsVec; -Function *MergedFunc = nullptr; -std::string MergedModId; SmallVector FuncMergeInfos; -for (auto &SF : SFS) { - // Get the function from the stable name. - auto I = StableNameToFuncMap.find( - *FunctionMap->getNameForId(SF->FunctionNameId)); - if (I == StableNameToFuncMap.end()) -continue; - Function *F = I->second; - assert(F); - // Skip if the function has been merged before. - if (MergedFunctions.count(F)) -continue; - // Consider the function if it is eligible for merging. - if (!isEligibleFunction(F)) -continue; - auto FI = llvm::StructuralHashWithDifferences(*F, ignoreOp); - uint64_t FuncHash = FI.FunctionHash; - if (Hash != FuncHash) { -++NumMismatchedFunctionHash; -continue; - } +// Iterate functions with the same hash. +for (auto &F : Funcs) { + auto &SFS = Maps.at(Hash); + auto &FI = FuncToFI.at(F); - if (SF->InstCount != FI.IndexInstruction->size()) { -++NumMismatchedInstCount; + // Check if the function is compatible with any stable function + // in terms of the number of instructions and ignored operands. + assert(!SFS.empty()); + auto &RFS = SFS[0]; + if (RFS->InstCount != FI.IndexInstruction->size()) continue; - } - bool HasValidSharedConst = true; - for (auto &[Index, Hash] : *SF->IndexOperandHashMap) { -auto [InstIndex, OpndIndex] = Index; -assert(InstIndex < FI.IndexInstruction->size()); -auto *Inst = FI.IndexInstruction->lookup(InstIndex); -if (!ignoreOp(Inst, OpndIndex)) { - HasValidSharedConst = false; - break; -} - } - if (!HasValidSharedConst) { -++NumMismatchedConstHash; -continue; - } - if (!checkConstHashCompatible(*SF->IndexOperandHashMap, -*FI.IndexOperandHashMap)) { -
[llvm-branch-commits] [llvm] [CGData] Refactor Global Merge Functions (PR #115750)
kyulee-com wrote: @nocchijiang The new approach seems to be functioning well and is similar in size to the previous method. I suspect that the no-LTO case might still encounter some slowdown, as each CU needs to read the entire CGData regardless. Currently, the CGData used for this merging process does not utilize names, which means we could potentially eliminate strings or make them optional. Alternatively, we could restructure the indexed CGData to allow for reading only the relevant hash entries on demand. I'd like to leave these options open for now, and if you can continue to improve it, that would be excellent. https://github.com/llvm/llvm-project/pull/115750 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [CGData] Refactor Global Merge Functions (PR #115750)
kyulee-com wrote: @nocchijiang The new approach seems to be functioning well and is similar in size to the previous method. I suspect that the no-LTO case might still encounter some slowdown, as each CU needs to read the entire CGData regardless. Currently, the CGData used for this merging process does not utilize names, which means we could potentially eliminate strings or make them optional. Alternatively, we could restructure the indexed CGData to allow for reading only the relevant hash entries on demand. I'd like to leave these options open for now, and if you can continue to improve it, that would be excellent. https://github.com/llvm/llvm-project/pull/115750 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [CGData] Refactor Global Merge Functions (PR #115750)
kyulee-com wrote: > Do we know why `OpIdx` is 4 here? This is confusing to me because it looks > like there is only one argument, `%5`. The `ignoreOp` function was initially designed for use with `llvm::StructuralHashWithDifferences`, where it iterates over operands within the same instruction. In this context, `OpIdx` is always within the valid range for the specified instruction. However, we now also utilize this function to determine if a particular operand can be ignored in certain instructions during this merge operation, as matched in the stable function summary— see `hasValidSharedConst()` for its use. So, there may be cases where an out-of-range index is passed from a different instruction context (although the entire function hash is matched). In this case, we should simply return false, as the target operand is not an interesting operand (that can be ignored/parameterized for merging). https://github.com/llvm/llvm-project/pull/115750 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [CGData] Refactor Global Merge Functions (PR #115750)
https://github.com/kyulee-com edited https://github.com/llvm/llvm-project/pull/115750 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [CGData] Refactor Global Merge Functions (PR #115750)
https://github.com/kyulee-com updated https://github.com/llvm/llvm-project/pull/115750 >From 70dcb2ccba98b392c3539f349ccf7fec284a674c Mon Sep 17 00:00:00 2001 From: Kyungwoo Lee Date: Mon, 11 Nov 2024 10:06:56 -0800 Subject: [PATCH 1/3] [CGData] Refactor Global Merge Functions --- llvm/lib/CodeGen/GlobalMergeFunctions.cpp | 148 +- 1 file changed, 59 insertions(+), 89 deletions(-) diff --git a/llvm/lib/CodeGen/GlobalMergeFunctions.cpp b/llvm/lib/CodeGen/GlobalMergeFunctions.cpp index 2b367ca87d9008..df8dbb8a73b95d 100644 --- a/llvm/lib/CodeGen/GlobalMergeFunctions.cpp +++ b/llvm/lib/CodeGen/GlobalMergeFunctions.cpp @@ -31,14 +31,6 @@ static cl::opt DisableCGDataForMerging( "merging is still enabled within a module."), cl::init(false)); -STATISTIC(NumMismatchedFunctionHash, - "Number of mismatched function hash for global merge function"); -STATISTIC(NumMismatchedInstCount, - "Number of mismatched instruction count for global merge function"); -STATISTIC(NumMismatchedConstHash, - "Number of mismatched const hash for global merge function"); -STATISTIC(NumMismatchedModuleId, - "Number of mismatched Module Id for global merge function"); STATISTIC(NumMergedFunctions, "Number of functions that are actually merged using function hash"); STATISTIC(NumAnalyzedModues, "Number of modules that are analyzed"); @@ -203,9 +195,9 @@ void GlobalMergeFunc::analyze(Module &M) { struct FuncMergeInfo { StableFunctionMap::StableFunctionEntry *SF; Function *F; - std::unique_ptr IndexInstruction; + IndexInstrMap *IndexInstruction; FuncMergeInfo(StableFunctionMap::StableFunctionEntry *SF, Function *F, -std::unique_ptr IndexInstruction) +IndexInstrMap *IndexInstruction) : SF(SF), F(F), IndexInstruction(std::move(IndexInstruction)) {} }; @@ -420,101 +412,79 @@ static ParamLocsVecTy computeParamInfo( bool GlobalMergeFunc::merge(Module &M, const StableFunctionMap *FunctionMap) { bool Changed = false; - // Build a map from stable function name to function. - StringMap StableNameToFuncMap; - for (auto &F : M) -StableNameToFuncMap[get_stable_name(F.getName())] = &F; - // Track merged functions - DenseSet MergedFunctions; - - auto ModId = M.getModuleIdentifier(); - for (auto &[Hash, SFS] : FunctionMap->getFunctionMap()) { -// Parameter locations based on the unique hash sequences -// across the candidates. + // Collect stable functions related to the current module. + DenseMap> HashToFuncs; + DenseMap FuncToFI; + auto &Maps = FunctionMap->getFunctionMap(); + for (auto &F : M) { +if (!isEligibleFunction(&F)) + continue; +auto FI = llvm::StructuralHashWithDifferences(F, ignoreOp); +if (Maps.contains(FI.FunctionHash)) { + HashToFuncs[FI.FunctionHash].push_back(&F); + FuncToFI.try_emplace(&F, std::move(FI)); +} + } + + for (auto &[Hash, Funcs] : HashToFuncs) { std::optional ParamLocsVec; -Function *MergedFunc = nullptr; -std::string MergedModId; SmallVector FuncMergeInfos; -for (auto &SF : SFS) { - // Get the function from the stable name. - auto I = StableNameToFuncMap.find( - *FunctionMap->getNameForId(SF->FunctionNameId)); - if (I == StableNameToFuncMap.end()) -continue; - Function *F = I->second; - assert(F); - // Skip if the function has been merged before. - if (MergedFunctions.count(F)) -continue; - // Consider the function if it is eligible for merging. - if (!isEligibleFunction(F)) -continue; - auto FI = llvm::StructuralHashWithDifferences(*F, ignoreOp); - uint64_t FuncHash = FI.FunctionHash; - if (Hash != FuncHash) { -++NumMismatchedFunctionHash; -continue; - } +// Iterate functions with the same hash. +for (auto &F : Funcs) { + auto &SFS = Maps.at(Hash); + auto &FI = FuncToFI.at(F); - if (SF->InstCount != FI.IndexInstruction->size()) { -++NumMismatchedInstCount; + // Check if the function is compatible with any stable function + // in terms of the number of instructions and ignored operands. + assert(!SFS.empty()); + auto &RFS = SFS[0]; + if (RFS->InstCount != FI.IndexInstruction->size()) continue; - } - bool HasValidSharedConst = true; - for (auto &[Index, Hash] : *SF->IndexOperandHashMap) { -auto [InstIndex, OpndIndex] = Index; -assert(InstIndex < FI.IndexInstruction->size()); -auto *Inst = FI.IndexInstruction->lookup(InstIndex); -if (!ignoreOp(Inst, OpndIndex)) { - HasValidSharedConst = false; - break; -} - } - if (!HasValidSharedConst) { -++NumMismatchedConstHash; -continue; - } - if (!checkConstHashCompatible(*SF->IndexOperandHashMap, -*FI.IndexOperandHashMap)) { -
[llvm-branch-commits] [llvm] [CGData] Refactor Global Merge Functions (PR #115750)
kyulee-com wrote: > Hit an assertion in `ignoreOp` when testing the refactored code. > > ``` > Assertion failed: (OpIdx < I->getNumOperands() && "Invalid operand index"), > function ignoreOp, file GlobalMergeFunctions.cpp, line 129. > Stop reason: hit program assert > expr I->dump() > %6 = tail call ptr @objc_retain(ptr %5), !dbg !576 > > p I->getNumOperands() > > (unsigned int) 2 > p OpIdx > > (unsigned int) 4 > ``` Thank you for testing and identifying this bug! Since we also use this function to verify any function that matches a hash, it should not assert. https://github.com/llvm/llvm-project/pull/115750 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] test2 (PR #109137)
https://github.com/kyulee-com created https://github.com/llvm/llvm-project/pull/109137 None >From 32ae0b07276f7ccbdc5dd6675e0c46b507625449 Mon Sep 17 00:00:00 2001 From: Kyungwoo Lee Date: Wed, 18 Sep 2024 06:05:41 -0700 Subject: [PATCH] test2 --- llvm/lib/LTO/LTO.cpp | 1 + 1 file changed, 1 insertion(+) diff --git a/llvm/lib/LTO/LTO.cpp b/llvm/lib/LTO/LTO.cpp index 9a01edd70e08c9..d33815ff704128 100644 --- a/llvm/lib/LTO/LTO.cpp +++ b/llvm/lib/LTO/LTO.cpp @@ -1371,6 +1371,7 @@ SmallVector LTO::getRuntimeLibcallSymbols(const Triple &TT) { /// This class defines the interface to the ThinLTO backend. /// Test +/// Test2 class lto::ThinBackendProc { protected: const Config &Conf; ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [CGData] Refactor Global Merge Functions (PR #115750)
kyulee-com wrote: > I can confirm that the performance have been improved significantly from my > testing on no-LTO projects that the slowdown is acceptable now. Before > applying the PR it was about 50% slowdown, now it is ~5%. That's great to hear! Since these PRs appear to be functioning, is it okay to merge them for now while we continue to discuss further improvements? Or do you have more comments to be addressed? https://github.com/llvm/llvm-project/pull/115750 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [CodeGen] Limit number of analyzed predecessors (PR #142584)
https://github.com/kyulee-com approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/142584 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [CodeGen] Limit number of analyzed predecessors (PR #142584)
kyulee-com wrote: Adding this threshold check within `isTrellis()` feels somewhat unnatural. If compile time is a concern, could we simply check the size of functions (in terms of the number of blocks, as opposed to predecessor only) early in this pass and either skip it or switch to a faster, simpler algorithm? Also 1000 size seems small, may be 1? https://github.com/llvm/llvm-project/pull/142584 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [CodeGen] Limit number of analyzed predecessors (PR #142584)
kyulee-com wrote: Can we add a LIT test case using this flag? I think you could set it with a smaller number to create a test case. https://github.com/llvm/llvm-project/pull/142584 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits