https://github.com/aaupov updated https://github.com/llvm/llvm-project/pull/76904
>From 2027bc7dc00395884c3bd4da21bbb79d079293fc Mon Sep 17 00:00:00 2001 From: Amir Ayupov <aau...@fb.com> Date: Wed, 3 Jan 2024 21:25:27 -0800 Subject: [PATCH 1/2] =?UTF-8?q?[=F0=9D=98=80=F0=9D=97=BD=F0=9D=97=BF]=20ch?= =?UTF-8?q?anges=20to=20main=20this=20commit=20is=20based=20on?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Created using spr 1.3.4 [skip ci] --- bolt/docs/BAT.md | 91 ++++++++++++ .../bolt/Profile/BoltAddressTranslation.h | 14 +- bolt/lib/Profile/BoltAddressTranslation.cpp | 138 ++++++++++-------- bolt/lib/Rewrite/RewriteInstance.cpp | 1 + bolt/test/X86/bolt-address-translation.test | 2 +- 5 files changed, 180 insertions(+), 66 deletions(-) create mode 100644 bolt/docs/BAT.md diff --git a/bolt/docs/BAT.md b/bolt/docs/BAT.md new file mode 100644 index 00000000000000..5a257a46955607 --- /dev/null +++ b/bolt/docs/BAT.md @@ -0,0 +1,91 @@ +# BOLT Address Translation (BAT) +# Purpose +A regular profile collection for BOLT involves collecting samples from +unoptimized binary. BOLT Address Translation allows collecting profile +from BOLT-optimized binary and using it for optimizing the input (pre-BOLT) +binary. + +# Overview +BOLT Address Translation is an extra section (`.note.bolt_bat`) inserted by BOLT +into the output binary containing translation tables and split functions linkage +information. This information enables mapping the profile back from optimized +binary onto the original binary. + +# Usage +`--enable-bat` flag controls the generation of BAT section. Sampled profile +needs to be passed along with the optimized binary containing BAT section to +`perf2bolt` which reads BAT section and produces fdata profile for the original +binary. Note that YAML profile generation is not supported since BAT doesn't +contain the metadata for input functions. + +# Internals +## Section contents +The section is organized as follows: +- Hot functions table + - Address translation tables +- Cold functions table + +## Construction and parsing +BAT section is created from `BoltAddressTranslation` class which captures +address translation information provided by BOLT linker. It is then encoded as a +note section in the output binary. + +During profile conversion when BAT-enabled binary is passed to perf2bolt, +`BoltAddressTranslation` class is populated from BAT section. The class is then +queried by `DataAggregator` during sample processing to reconstruct addresses/ +offsets in the input binary. + +## Encoding format +The encoding is specified in bolt/include/bolt/Profile/BoltAddressTranslation.h +and bolt/lib/Profile/BoltAddressTranslation.cpp. + +### Layout +The general layout is as follows: +``` +Hot functions table header +|------------------| +| Function entry | +| |--------------| | +| | OutOff InOff | | +| |--------------| | +~~~~~~~~~~~~~~~~~~~~ + +Cold functions table header +|------------------| +| Function entry | +| |--------------| | +| | OutOff InOff | | +| |--------------| | +~~~~~~~~~~~~~~~~~~~~ +``` + +### Functions table +Hot and cold functions tables share the encoding except difference marked below. +Header: +| Entry | Encoding | Description | +| ------ | ----- | ----------- | +| `NumFuncs` | ULEB128 | Number of functions in the functions table | + +The header is followed by Functions table with `NumFuncs` entries. +Output binary addresses are delta encoded, meaning that only the difference with +the previous output address is stored. Addresses implicitly start at zero. +Hot indices are delta encoded, implicitly starting at zero. +| Entry | Encoding | Description | +| ------ | ------| ----------- | +| `Address` | Delta, ULEB128 | Function address in the output binary | +| `HotIndex` | Delta, ULEB128 | Cold functions only: index of corresponding hot function in hot functions table | +| `NumEntries` | ULEB128 | Number of address translation entries for a function | +Function header is followed by `NumEntries` pairs of offsets for current +function. + +### Address translation table +Delta encoding means that only the difference with the previous corresponding +entry is encoded. Offsets implicitly start at zero. +| Entry | Encoding | Description | +| ------ | ------| ----------- | +| `OutputAddr` | Delta, ULEB128 | Function offset in output binary | +| `InputAddr` | Delta, SLEB128 | Function offset in input binary with `BRANCHENTRY` LSB bit | + +`BRANCHENTRY` bit denotes whether a given offset pair is a control flow source +(branch or call instruction). If not set, it signifies a control flow target +(basic block offset). diff --git a/bolt/include/bolt/Profile/BoltAddressTranslation.h b/bolt/include/bolt/Profile/BoltAddressTranslation.h index 07e4b283211c69..feeda2ca1871be 100644 --- a/bolt/include/bolt/Profile/BoltAddressTranslation.h +++ b/bolt/include/bolt/Profile/BoltAddressTranslation.h @@ -11,6 +11,7 @@ #include "llvm/ADT/SmallVector.h" #include "llvm/ADT/StringRef.h" +#include "llvm/Support/DataExtractor.h" #include <cstdint> #include <map> #include <optional> @@ -78,10 +79,20 @@ class BoltAddressTranslation { BoltAddressTranslation() {} + /// Write the serialized address translation table for a function. + template <bool Cold> + void writeMaps(std::map<uint64_t, MapTy> &Maps, raw_ostream &OS); + /// Write the serialized address translation tables for each reordered /// function void write(const BinaryContext &BC, raw_ostream &OS); + /// Read the serialized address translation table for a function. + /// Return a parse error if failed. + template <bool Cold> + void parseMaps(std::vector<uint64_t> &HotFuncs, DataExtractor &DE, + uint64_t &Offset, Error &Err); + /// Read the serialized address translation tables and load them internally /// in memory. Return a parse error if failed. std::error_code parse(StringRef Buf); @@ -119,13 +130,14 @@ class BoltAddressTranslation { uint64_t FuncAddress); std::map<uint64_t, MapTy> Maps; + std::map<uint64_t, MapTy> ColdMaps; /// Links outlined cold bocks to their original function std::map<uint64_t, uint64_t> ColdPartSource; /// Identifies the address of a control-flow changing instructions in a /// translation map entry - const static uint32_t BRANCHENTRY = 0x80000000; + const static uint32_t BRANCHENTRY = 0x1; }; } // namespace bolt diff --git a/bolt/lib/Profile/BoltAddressTranslation.cpp b/bolt/lib/Profile/BoltAddressTranslation.cpp index e004309e0e2136..a62b9fc69ef0d0 100644 --- a/bolt/lib/Profile/BoltAddressTranslation.cpp +++ b/bolt/lib/Profile/BoltAddressTranslation.cpp @@ -8,8 +8,9 @@ #include "bolt/Profile/BoltAddressTranslation.h" #include "bolt/Core/BinaryFunction.h" -#include "llvm/Support/DataExtractor.h" #include "llvm/Support/Errc.h" +#include "llvm/Support/Error.h" +#include "llvm/Support/LEB128.h" #define DEBUG_TYPE "bolt-bat" @@ -44,7 +45,7 @@ void BoltAddressTranslation::writeEntriesForBB(MapTy &Map, // and this deleted block will both share the same output address (the same // key), and we need to map back. We choose here to privilege the successor by // allowing it to overwrite the previously inserted key in the map. - Map[BBOutputOffset] = BBInputOffset; + Map[BBOutputOffset] = BBInputOffset << 1; const auto &IOAddressMap = BB.getFunction()->getBinaryContext().getIOAddressMap(); @@ -61,8 +62,8 @@ void BoltAddressTranslation::writeEntriesForBB(MapTy &Map, LLVM_DEBUG(dbgs() << " Key: " << Twine::utohexstr(OutputOffset) << " Val: " << Twine::utohexstr(InputOffset) << " (branch)\n"); - Map.insert( - std::pair<uint32_t, uint32_t>(OutputOffset, InputOffset | BRANCHENTRY)); + Map.insert(std::pair<uint32_t, uint32_t>(OutputOffset, + (InputOffset << 1) | BRANCHENTRY)); } } @@ -96,41 +97,51 @@ void BoltAddressTranslation::write(const BinaryContext &BC, raw_ostream &OS) { for (const BinaryBasicBlock *const BB : FF) writeEntriesForBB(Map, *BB, FF.getAddress()); - Maps.emplace(FF.getAddress(), std::move(Map)); + ColdMaps.emplace(FF.getAddress(), std::move(Map)); ColdPartSource.emplace(FF.getAddress(), Function.getOutputAddress()); } } + writeMaps</*Cold=*/false>(Maps, OS); + writeMaps</*Cold=*/true>(ColdMaps, OS); + + outs() << "BOLT-INFO: Wrote " << Maps.size() + ColdMaps.size() + << " BAT maps\n"; +} + +template <bool Cold> +void BoltAddressTranslation::writeMaps(std::map<uint64_t, MapTy> &Maps, + raw_ostream &OS) { const uint32_t NumFuncs = Maps.size(); - OS.write(reinterpret_cast<const char *>(&NumFuncs), 4); - LLVM_DEBUG(dbgs() << "Writing " << NumFuncs << " functions for BAT.\n"); + encodeULEB128(NumFuncs, OS); + LLVM_DEBUG(dbgs() << "Writing " << NumFuncs << (Cold ? " cold" : "") + << " functions for BAT.\n"); + size_t PrevIndex = 0; + // Output addresses are delta-encoded + uint64_t PrevAddress = 0; for (auto &MapEntry : Maps) { const uint64_t Address = MapEntry.first; MapTy &Map = MapEntry.second; const uint32_t NumEntries = Map.size(); LLVM_DEBUG(dbgs() << "Writing " << NumEntries << " entries for 0x" << Twine::utohexstr(Address) << ".\n"); - OS.write(reinterpret_cast<const char *>(&Address), 8); - OS.write(reinterpret_cast<const char *>(&NumEntries), 4); + encodeULEB128(Address - PrevAddress, OS); + PrevAddress = Address; + if (Cold) { + size_t HotIndex = + std::distance(ColdPartSource.begin(), ColdPartSource.find(Address)); + encodeULEB128(HotIndex - PrevIndex, OS); + PrevIndex = HotIndex; + } + encodeULEB128(NumEntries, OS); + uint64_t InOffset = 0, OutOffset = 0; + // Output and Input addresses and delta-encoded for (std::pair<const uint32_t, uint32_t> &KeyVal : Map) { - OS.write(reinterpret_cast<const char *>(&KeyVal.first), 4); - OS.write(reinterpret_cast<const char *>(&KeyVal.second), 4); + encodeULEB128(KeyVal.first - OutOffset, OS); + encodeSLEB128(KeyVal.second - InOffset, OS); + std::tie(OutOffset, InOffset) = KeyVal; } } - const uint32_t NumColdEntries = ColdPartSource.size(); - LLVM_DEBUG(dbgs() << "Writing " << NumColdEntries - << " cold part mappings.\n"); - OS.write(reinterpret_cast<const char *>(&NumColdEntries), 4); - for (std::pair<const uint64_t, uint64_t> &ColdEntry : ColdPartSource) { - OS.write(reinterpret_cast<const char *>(&ColdEntry.first), 8); - OS.write(reinterpret_cast<const char *>(&ColdEntry.second), 8); - LLVM_DEBUG(dbgs() << " " << Twine::utohexstr(ColdEntry.first) << " -> " - << Twine::utohexstr(ColdEntry.second) << "\n"); - } - - outs() << "BOLT-INFO: Wrote " << Maps.size() << " BAT maps\n"; - outs() << "BOLT-INFO: Wrote " << NumColdEntries - << " BAT cold-to-hot entries\n"; } std::error_code BoltAddressTranslation::parse(StringRef Buf) { @@ -152,53 +163,52 @@ std::error_code BoltAddressTranslation::parse(StringRef Buf) { if (Name.substr(0, 4) != "BOLT") return make_error_code(llvm::errc::io_error); - if (Buf.size() - Offset < 4) - return make_error_code(llvm::errc::io_error); + Error Err(Error::success()); + std::vector<uint64_t> HotFuncs; + parseMaps</*Cold=*/false>(HotFuncs, DE, Offset, Err); + parseMaps</*Cold=*/true>(HotFuncs, DE, Offset, Err); + outs() << "BOLT-INFO: Parsed " << Maps.size() << " BAT entries\n"; + return errorToErrorCode(std::move(Err)); +} - const uint32_t NumFunctions = DE.getU32(&Offset); - LLVM_DEBUG(dbgs() << "Parsing " << NumFunctions << " functions\n"); +template <bool Cold> +void BoltAddressTranslation::parseMaps(std::vector<uint64_t> &HotFuncs, + DataExtractor &DE, uint64_t &Offset, + Error &Err) { + const uint32_t NumFunctions = DE.getULEB128(&Offset, &Err); + LLVM_DEBUG(dbgs() << "Parsing " << NumFunctions << (Cold ? " cold" : "") + << " functions\n"); + size_t HotIndex = 0; + uint64_t PrevAddress = 0; for (uint32_t I = 0; I < NumFunctions; ++I) { - if (Buf.size() - Offset < 12) - return make_error_code(llvm::errc::io_error); - - const uint64_t Address = DE.getU64(&Offset); - const uint32_t NumEntries = DE.getU32(&Offset); + const uint64_t Address = PrevAddress + DE.getULEB128(&Offset, &Err); + PrevAddress = Address; + if (Cold) { + HotIndex += DE.getULEB128(&Offset, &Err); + ColdPartSource.emplace(Address, HotFuncs[HotIndex]); + } else { + HotFuncs.push_back(Address); + } + const uint32_t NumEntries = DE.getULEB128(&Offset, &Err); MapTy Map; LLVM_DEBUG(dbgs() << "Parsing " << NumEntries << " entries for 0x" << Twine::utohexstr(Address) << "\n"); - if (Buf.size() - Offset < 8 * NumEntries) - return make_error_code(llvm::errc::io_error); + uint64_t InputOffset = 0, OutputOffset = 0; for (uint32_t J = 0; J < NumEntries; ++J) { - const uint32_t OutputAddr = DE.getU32(&Offset); - const uint32_t InputAddr = DE.getU32(&Offset); - Map.insert(std::pair<uint32_t, uint32_t>(OutputAddr, InputAddr)); - LLVM_DEBUG(dbgs() << Twine::utohexstr(OutputAddr) << " -> " - << Twine::utohexstr(InputAddr) << "\n"); + const uint64_t OutputDelta = DE.getULEB128(&Offset, &Err); + const int64_t InputDelta = DE.getSLEB128(&Offset, &Err); + OutputOffset += OutputDelta; + InputOffset += InputDelta; + Map.insert(std::pair<uint32_t, uint32_t>(OutputOffset, InputOffset)); + LLVM_DEBUG(dbgs() << formatv("{0:x} -> {1:x} ({2}/{3}b -> {4}/{5}b)\n", + OutputOffset, InputOffset, OutputDelta, + encodeULEB128(OutputDelta, nulls()), + InputDelta, + encodeSLEB128(InputDelta, nulls()))); } Maps.insert(std::pair<uint64_t, MapTy>(Address, Map)); } - - if (Buf.size() - Offset < 4) - return make_error_code(llvm::errc::io_error); - - const uint32_t NumColdEntries = DE.getU32(&Offset); - LLVM_DEBUG(dbgs() << "Parsing " << NumColdEntries << " cold part mappings\n"); - for (uint32_t I = 0; I < NumColdEntries; ++I) { - if (Buf.size() - Offset < 16) - return make_error_code(llvm::errc::io_error); - const uint32_t ColdAddress = DE.getU64(&Offset); - const uint32_t HotAddress = DE.getU64(&Offset); - ColdPartSource.insert( - std::pair<uint64_t, uint64_t>(ColdAddress, HotAddress)); - LLVM_DEBUG(dbgs() << Twine::utohexstr(ColdAddress) << " -> " - << Twine::utohexstr(HotAddress) << "\n"); - } - outs() << "BOLT-INFO: Parsed " << Maps.size() << " BAT entries\n"; - outs() << "BOLT-INFO: Parsed " << NumColdEntries - << " BAT cold-to-hot entries\n"; - - return std::error_code(); } void BoltAddressTranslation::dump(raw_ostream &OS) { @@ -209,7 +219,7 @@ void BoltAddressTranslation::dump(raw_ostream &OS) { OS << "BB mappings:\n"; for (const auto &Entry : MapEntry.second) { const bool IsBranch = Entry.second & BRANCHENTRY; - const uint32_t Val = Entry.second & ~BRANCHENTRY; + const uint32_t Val = Entry.second >> 1; // dropping BRANCHENTRY bit OS << "0x" << Twine::utohexstr(Entry.first) << " -> " << "0x" << Twine::utohexstr(Val); if (IsBranch) @@ -244,7 +254,7 @@ uint64_t BoltAddressTranslation::translate(uint64_t FuncAddress, --KeyVal; - const uint32_t Val = KeyVal->second & ~BRANCHENTRY; + const uint32_t Val = KeyVal->second >> 1; // dropping BRANCHENTRY bit // Branch source addresses are translated to the first instruction of the // source BB to avoid accounting for modifications BOLT may have made in the // BB regarding deletion/addition of instructions. diff --git a/bolt/lib/Rewrite/RewriteInstance.cpp b/bolt/lib/Rewrite/RewriteInstance.cpp index a95b1650753cfd..f5a8a5b7168745 100644 --- a/bolt/lib/Rewrite/RewriteInstance.cpp +++ b/bolt/lib/Rewrite/RewriteInstance.cpp @@ -4112,6 +4112,7 @@ void RewriteInstance::encodeBATSection() { copyByteArray(BoltInfo), BoltInfo.size(), /*Alignment=*/1, /*IsReadOnly=*/true, ELF::SHT_NOTE); + outs() << "BOLT-INFO: BAT section size (bytes): " << BoltInfo.size() << '\n'; } template <typename ELFShdrTy> diff --git a/bolt/test/X86/bolt-address-translation.test b/bolt/test/X86/bolt-address-translation.test index f68a8f7e9bcb7f..430b4cb007310f 100644 --- a/bolt/test/X86/bolt-address-translation.test +++ b/bolt/test/X86/bolt-address-translation.test @@ -36,7 +36,7 @@ # # CHECK: BOLT: 3 out of 7 functions were overwritten. # CHECK: BOLT-INFO: Wrote 6 BAT maps -# CHECK: BOLT-INFO: Wrote 3 BAT cold-to-hot entries +# CHECK: BOLT-INFO: BAT section size (bytes): 404 # # usqrt mappings (hot part). We match against any key (left side containing # the bolted binary offsets) because BOLT may change where it puts instructions >From ed540bbda0ccbe8d5bce0c150a2777f5580043d5 Mon Sep 17 00:00:00 2001 From: Amir Ayupov <aau...@fb.com> Date: Thu, 4 Jan 2024 07:23:43 -0800 Subject: [PATCH 2/2] =?UTF-8?q?[=F0=9D=98=80=F0=9D=97=BD=F0=9D=97=BF]=20ch?= =?UTF-8?q?anges=20introduced=20through=20rebase?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Created using spr 1.3.4 [skip ci] --- bolt/docs/BAT.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/bolt/docs/BAT.md b/bolt/docs/BAT.md index 5a257a46955607..a096e5c5f5a82b 100644 --- a/bolt/docs/BAT.md +++ b/bolt/docs/BAT.md @@ -12,7 +12,7 @@ information. This information enables mapping the profile back from optimized binary onto the original binary. # Usage -`--enable-bat` flag controls the generation of BAT section. Sampled profile +`--enable-bat` flag controls the generation of BAT section. Sampled profile needs to be passed along with the optimized binary containing BAT section to `perf2bolt` which reads BAT section and produces fdata profile for the original binary. Note that YAML profile generation is not supported since BAT doesn't @@ -30,14 +30,15 @@ BAT section is created from `BoltAddressTranslation` class which captures address translation information provided by BOLT linker. It is then encoded as a note section in the output binary. -During profile conversion when BAT-enabled binary is passed to perf2bolt, +During profile conversion when BAT-enabled binary is passed to perf2bolt, `BoltAddressTranslation` class is populated from BAT section. The class is then queried by `DataAggregator` during sample processing to reconstruct addresses/ offsets in the input binary. ## Encoding format -The encoding is specified in bolt/include/bolt/Profile/BoltAddressTranslation.h -and bolt/lib/Profile/BoltAddressTranslation.cpp. +The encoding is specified in +[BoltAddressTranslation.h](/bolt/include/bolt/Profile/BoltAddressTranslation.h) +and [BoltAddressTranslation.cpp](/bolt/lib/Profile/BoltAddressTranslation.cpp). ### Layout The general layout is as follows: @@ -75,6 +76,7 @@ Hot indices are delta encoded, implicitly starting at zero. | `Address` | Delta, ULEB128 | Function address in the output binary | | `HotIndex` | Delta, ULEB128 | Cold functions only: index of corresponding hot function in hot functions table | | `NumEntries` | ULEB128 | Number of address translation entries for a function | + Function header is followed by `NumEntries` pairs of offsets for current function. _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits