Author: Alex Lorenz
Date: 2022-02-14T09:27:44-08:00
New Revision: 00cd6c04202acf71f74c670b2dd4343929d1f45f
URL:
https://github.com/llvm/llvm-project/commit/00cd6c04202acf71f74c670b2dd4343929d1f45f
DIFF:
https://github.com/llvm/llvm-project/commit/00cd6c04202acf71f74c670b2dd4343929d1f45f.diff
LOG: [Preprocessor] Reduce the memory overhead of `#define` directives
(Recommit)
Recently we observed high memory pressure caused by clang during some parallel
builds.
We discovered that we have several projects that have a large number of #define
directives
in their TUs (on the order of millions), which caused huge memory consumption
in clang due
to a lot of allocations for MacroInfo. We would like to reduce the memory
overhead of
clang for a single #define to reduce the memory overhead for these files, to
allow us to
reduce the memory pressure on the system during highly parallel builds. This
change achieves
that by removing the SmallVector in MacroInfo and instead storing the tokens in
an array
allocated using the bump pointer allocator, after all tokens are lexed.
The added unit test with 100 #define directives illustrates the problem.
Prior to this
change, on arm64 macOS, clang's PP bump pointer allocator allocated 272007616
bytes, and
used roughly 272 bytes per #define. After this change, clang's PP bump pointer
allocator
allocates 120002016 bytes, and uses only roughly 120 bytes per #define.
For an example test file that we have internally with 7.8 million #define
directives, this
change produces the following improvement on arm64 macOS: Persistent allocation
footprint for
this test case file as it's being compiled to LLVM IR went down 22% from 5.28
GB to 4.07 GB
and the total allocations went down 14% from 8.26 GB to 7.05 GB. Furthermore,
this change
reduced the total number of allocations made by the system for this clang
invocation from
1454853 to 133663, an order of magnitude improvement.
The recommit fixes the LLDB build failure.
Differential Revision: https://reviews.llvm.org/D117348
Added:
clang/unittests/Lex/PPMemoryAllocationsTest.cpp
Modified:
clang/include/clang/Lex/MacroInfo.h
clang/lib/Lex/MacroInfo.cpp
clang/lib/Lex/PPDirectives.cpp
clang/lib/Serialization/ASTReader.cpp
clang/lib/Serialization/ASTWriter.cpp
clang/unittests/Lex/CMakeLists.txt
lldb/source/Plugins/ExpressionParser/Clang/ClangModulesDeclVendor.cpp
Removed:
diff --git a/clang/include/clang/Lex/MacroInfo.h
b/clang/include/clang/Lex/MacroInfo.h
index 0347a7a37186b..1947bc8fc509e 100644
--- a/clang/include/clang/Lex/MacroInfo.h
+++ b/clang/include/clang/Lex/MacroInfo.h
@@ -54,11 +54,14 @@ class MacroInfo {
/// macro, this includes the \c __VA_ARGS__ identifier on the list.
IdentifierInfo **ParameterList = nullptr;
+ /// This is the list of tokens that the macro is defined to.
+ const Token *ReplacementTokens = nullptr;
+
/// \see ParameterList
unsigned NumParameters = 0;
- /// This is the list of tokens that the macro is defined to.
- SmallVector ReplacementTokens;
+ /// \see ReplacementTokens
+ unsigned NumReplacementTokens = 0;
/// Length in characters of the macro definition.
mutable unsigned DefinitionLength;
@@ -230,26 +233,47 @@ class MacroInfo {
bool isWarnIfUnused() const { return IsWarnIfUnused; }
/// Return the number of tokens that this macro expands to.
- unsigned getNumTokens() const { return ReplacementTokens.size(); }
+ unsigned getNumTokens() const { return NumReplacementTokens; }
const Token &getReplacementToken(unsigned Tok) const {
-assert(Tok < ReplacementTokens.size() && "Invalid token #");
+assert(Tok < NumReplacementTokens && "Invalid token #");
return ReplacementTokens[Tok];
}
- using tokens_iterator = SmallVectorImpl::const_iterator;
+ using const_tokens_iterator = const Token *;
- tokens_iterator tokens_begin() const { return ReplacementTokens.begin(); }
- tokens_iterator tokens_end() const { return ReplacementTokens.end(); }
- bool tokens_empty() const { return ReplacementTokens.empty(); }
- ArrayRef tokens() const { return ReplacementTokens; }
+ const_tokens_iterator tokens_begin() const { return ReplacementTokens; }
+ const_tokens_iterator tokens_end() const {
+return ReplacementTokens + NumReplacementTokens;
+ }
+ bool tokens_empty() const { return NumReplacementTokens == 0; }
+ ArrayRef tokens() const {
+return llvm::makeArrayRef(ReplacementTokens, NumReplacementTokens);
+ }
- /// Add the specified token to the replacement text for the macro.
- void AddTokenToBody(const Token &Tok) {
+ llvm::MutableArrayRef
+ allocateTokens(unsigned NumTokens, llvm::BumpPtrAllocator &PPAllocator) {
+assert(ReplacementTokens == nullptr && NumReplacementTokens == 0 &&
+ "Token list already allocated!");
+NumReplacementTokens = NumTokens;
+Token *NewReplacementT