[clang] 5d8d3a1 - [NFC] Increase initial size of FoldingSets used in ASTContext and CodeGenTypes

2022-02-08 Thread Dawid Jurczak via cfe-commits

Author: Dawid Jurczak
Date: 2022-02-08T17:54:04+01:00
New Revision: 5d8d3a11c4d4ed8bb610f60f8fa37b8043a40acd

URL: 
https://github.com/llvm/llvm-project/commit/5d8d3a11c4d4ed8bb610f60f8fa37b8043a40acd
DIFF: 
https://github.com/llvm/llvm-project/commit/5d8d3a11c4d4ed8bb610f60f8fa37b8043a40acd.diff

LOG: [NFC] Increase initial size of FoldingSets used in ASTContext and 
CodeGenTypes

Among many FoldingSet users most notable seem to be ASTContext and CodeGenTypes.
The reasons that we spend not-so-tiny amount of time in FoldingSet calls from 
there, are following:

  1. Default FoldingSet capacity for 2^6 items very often is not enough.
 For PointerTypes/ElaboratedTypes/ParenTypes it's not unlikely to observe 
growing it to 256 or 512 items.
 FunctionProtoTypes can easily exceed 1k items capacity growing up to 4k or 
even 8k size.

  2. FoldingSetBase::GrowBucketCount cost itself is not very bad (pure 
reallocations are rather cheap thanks to BumpPtrAllocator).
 What matters is high collision rate when lot of items end up in same 
bucket slowing down FoldingSetBase::FindNodeOrInsertPos and trashing CPU cache
 (as items with same hash are organized in intrusive linked list which need 
to be traversed).

This change address both issues by increasing initial size of FoldingSets used 
in ASTContext and CodeGenTypes.

Extracted from: https://reviews.llvm.org/D118385

Differential Revision: https://reviews.llvm.org/D118608

Added: 


Modified: 
clang/include/clang/AST/ASTContext.h
clang/lib/AST/ASTContext.cpp
clang/lib/CodeGen/CodeGenTypes.h

Removed: 




diff  --git a/clang/include/clang/AST/ASTContext.h 
b/clang/include/clang/AST/ASTContext.h
index 63c11e237d6c8..510c63962053b 100644
--- a/clang/include/clang/AST/ASTContext.h
+++ b/clang/include/clang/AST/ASTContext.h
@@ -211,7 +211,7 @@ class ASTContext : public RefCountedBase {
   mutable SmallVector Types;
   mutable llvm::FoldingSet ExtQualNodes;
   mutable llvm::FoldingSet ComplexTypes;
-  mutable llvm::FoldingSet PointerTypes;
+  mutable llvm::FoldingSet PointerTypes{GeneralTypesLog2InitSize};
   mutable llvm::FoldingSet AdjustedTypes;
   mutable llvm::FoldingSet BlockPointerTypes;
   mutable llvm::FoldingSet LValueReferenceTypes;
@@ -243,9 +243,10 @@ class ASTContext : public RefCountedBase {
 SubstTemplateTypeParmPackTypes;
   mutable llvm::ContextualFoldingSet
 TemplateSpecializationTypes;
-  mutable llvm::FoldingSet ParenTypes;
+  mutable llvm::FoldingSet ParenTypes{GeneralTypesLog2InitSize};
   mutable llvm::FoldingSet UsingTypes;
-  mutable llvm::FoldingSet ElaboratedTypes;
+  mutable llvm::FoldingSet ElaboratedTypes{
+  GeneralTypesLog2InitSize};
   mutable llvm::FoldingSet DependentNameTypes;
   mutable llvm::ContextualFoldingSet
@@ -466,6 +467,10 @@ class ASTContext : public RefCountedBase {
   };
   llvm::DenseMap ModuleInitializers;
 
+  static constexpr unsigned ConstantArrayTypesLog2InitSize = 8;
+  static constexpr unsigned GeneralTypesLog2InitSize = 9;
+  static constexpr unsigned FunctionProtoTypesLog2InitSize = 12;
+
   ASTContext &this_() { return *this; }
 
 public:

diff  --git a/clang/lib/AST/ASTContext.cpp b/clang/lib/AST/ASTContext.cpp
index f2ac57465398b..527c8b56159e0 100644
--- a/clang/lib/AST/ASTContext.cpp
+++ b/clang/lib/AST/ASTContext.cpp
@@ -973,7 +973,8 @@ static bool isAddrSpaceMapManglingEnabled(const TargetInfo 
&TI,
 ASTContext::ASTContext(LangOptions &LOpts, SourceManager &SM,
IdentifierTable &idents, SelectorTable &sels,
Builtin::Context &builtins, TranslationUnitKind TUKind)
-: ConstantArrayTypes(this_()), FunctionProtoTypes(this_()),
+: ConstantArrayTypes(this_(), ConstantArrayTypesLog2InitSize),
+  FunctionProtoTypes(this_(), FunctionProtoTypesLog2InitSize),
   TemplateSpecializationTypes(this_()),
   DependentTemplateSpecializationTypes(this_()), AutoTypes(this_()),
   SubstTemplateTemplateParmPacks(this_()),

diff  --git a/clang/lib/CodeGen/CodeGenTypes.h 
b/clang/lib/CodeGen/CodeGenTypes.h
index 28b8312229439..05aae88ba59ed 100644
--- a/clang/lib/CodeGen/CodeGenTypes.h
+++ b/clang/lib/CodeGen/CodeGenTypes.h
@@ -76,7 +76,7 @@ class CodeGenTypes {
   llvm::DenseMap RecordDeclTypes;
 
   /// Hold memoized CGFunctionInfo results.
-  llvm::FoldingSet FunctionInfos;
+  llvm::FoldingSet FunctionInfos{FunctionInfosLog2InitSize};
 
   /// This set keeps track of records that we're currently converting
   /// to an IR type.  For example, when converting:
@@ -98,6 +98,7 @@ class CodeGenTypes {
 
   llvm::SmallSet RecordsWithOpaqueMemberPointers;
 
+  static constexpr unsigned FunctionInfosLog2InitSize = 9;
   /// Helper for ConvertType.
   llvm::Type *ConvertFunctionTypeInternal(QualType FT);
 



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/c

[clang] f54ca1f - [NFC][Coroutines] Add regression test for heap allocation elision optimization

2022-07-11 Thread Dawid Jurczak via cfe-commits

Author: Dawid Jurczak
Date: 2022-07-11T16:41:05+02:00
New Revision: f54ca1f632646ee9a2a052a28e93ffb0bcc4340d

URL: 
https://github.com/llvm/llvm-project/commit/f54ca1f632646ee9a2a052a28e93ffb0bcc4340d
DIFF: 
https://github.com/llvm/llvm-project/commit/f54ca1f632646ee9a2a052a28e93ffb0bcc4340d.diff

LOG: [NFC][Coroutines] Add regression test for heap allocation elision 
optimization

Recently C++ snippet included in this patch popped up at least twice in 
different regression contexts:
https://github.com/llvm/llvm-project/issues/56262 and 
https://reviews.llvm.org/D123300
It appears that Clang users rely on HALO so adding C++ example coming 
originally from Gor Nishanov to tests
should help in avoiding similar regressions in future.

Differential Revision: https://reviews.llvm.org/D129279

Added: 
clang/test/CodeGenCoroutines/Inputs/numeric.h
clang/test/CodeGenCoroutines/coro-halo.cpp

Modified: 


Removed: 




diff  --git a/clang/test/CodeGenCoroutines/Inputs/numeric.h 
b/clang/test/CodeGenCoroutines/Inputs/numeric.h
new file mode 100644
index 0..81f319eb7d052
--- /dev/null
+++ b/clang/test/CodeGenCoroutines/Inputs/numeric.h
@@ -0,0 +1,10 @@
+#pragma once
+
+namespace std {
+template 
+T accumulate(InputIterator first, InputIterator last, T init) {
+  for (; first != last; ++first)
+init = init + *first;
+  return init;
+}
+} // namespace std

diff  --git a/clang/test/CodeGenCoroutines/coro-halo.cpp 
b/clang/test/CodeGenCoroutines/coro-halo.cpp
new file mode 100644
index 0..14ae53c41aebe
--- /dev/null
+++ b/clang/test/CodeGenCoroutines/coro-halo.cpp
@@ -0,0 +1,102 @@
+// This tests that the coroutine heap allocation elision optimization could 
happen succesfully.
+// RUN: %clang_cc1 -no-opaque-pointers -triple x86_64-unknown-linux-gnu 
-std=c++20 -O2 -emit-llvm %s -o - | FileCheck %s
+
+#include "Inputs/coroutine.h"
+#include "Inputs/numeric.h"
+
+template  struct generator {
+  struct promise_type {
+T current_value;
+std::suspend_always yield_value(T value) {
+  this->current_value = value;
+  return {};
+}
+std::suspend_always initial_suspend() { return {}; }
+std::suspend_always final_suspend() noexcept { return {}; }
+generator get_return_object() { return generator{this}; };
+void unhandled_exception() {}
+void return_void() {}
+  };
+
+  struct iterator {
+std::coroutine_handle _Coro;
+bool _Done;
+
+iterator(std::coroutine_handle Coro, bool Done)
+: _Coro(Coro), _Done(Done) {}
+
+iterator &operator++() {
+  _Coro.resume();
+  _Done = _Coro.done();
+  return *this;
+}
+
+bool operator==(iterator const &_Right) const {
+  return _Done == _Right._Done;
+}
+
+bool operator!=(iterator const &_Right) const { return !(*this == _Right); 
}
+T const &operator*() const { return _Coro.promise().current_value; }
+T const *operator->() const { return &(operator*()); }
+  };
+
+  iterator begin() {
+p.resume();
+return {p, p.done()};
+  }
+
+  iterator end() { return {p, true}; }
+
+  generator(generator const &) = delete;
+  generator(generator &&rhs) : p(rhs.p) { rhs.p = nullptr; }
+
+  ~generator() {
+if (p)
+  p.destroy();
+  }
+
+private:
+  explicit generator(promise_type *p)
+  : p(std::coroutine_handle::from_promise(*p)) {}
+
+  std::coroutine_handle p;
+};
+
+template 
+generator seq() {
+  for (T i = {};; ++i)
+co_yield i;
+}
+
+template 
+generator take_until(generator &g, T limit) {
+  for (auto &&v : g)
+if (v < limit)
+  co_yield v;
+else
+  break;
+}
+
+template 
+generator multiply(generator &g, T factor) {
+  for (auto &&v : g)
+co_yield v *factor;
+}
+
+template 
+generator add(generator &g, T adder) {
+  for (auto &&v : g)
+co_yield v + adder;
+}
+
+int main() {
+  auto s = seq();
+  auto t = take_until(s, 10);
+  auto m = multiply(t, 2);
+  auto a = add(m, 110);
+  return std::accumulate(a.begin(), a.end(), 0);
+}
+
+// CHECK-LABEL: define{{.*}} i32 @main(
+//   CHECK: ret i32 1190
+//   CHECK-NOT: call{{.*}}_Znwm



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] b88ca61 - [NFC][CodeGen] Use llvm::DenseMap for DeferredDecls

2022-01-27 Thread Dawid Jurczak via cfe-commits

Author: Dawid Jurczak
Date: 2022-01-27T10:57:48+01:00
New Revision: b88ca619d33bc74e1776d879e43c6fc812ac4ff5

URL: 
https://github.com/llvm/llvm-project/commit/b88ca619d33bc74e1776d879e43c6fc812ac4ff5
DIFF: 
https://github.com/llvm/llvm-project/commit/b88ca619d33bc74e1776d879e43c6fc812ac4ff5.diff

LOG: [NFC][CodeGen] Use llvm::DenseMap for DeferredDecls

CodeGenModule::DeferredDecls std::map::operator[] seem to be hot especially 
while code generating huge compilation units.
In such cases using DenseMap instead gives observable compile time improvement. 
Patch was tested on Linux build with default config acting as benchmark.
Build was performed on isolated CPU cores in silent x86-64 Linux environment 
following: https://llvm.org/docs/Benchmarking.html#linux rules.
Compile time statistics diff produced by perf and time before and after change 
are following:
instructions -0.15%, cycles -0.7%, max-rss +0.65%.
Using StringMap instead DenseMap doesn't bring any visible gains.

Differential Revision: https://reviews.llvm.org/D118169

Added: 


Modified: 
clang/lib/CodeGen/CodeGenModule.h

Removed: 




diff  --git a/clang/lib/CodeGen/CodeGenModule.h 
b/clang/lib/CodeGen/CodeGenModule.h
index 9ae9d624b925d..e803022508a4c 100644
--- a/clang/lib/CodeGen/CodeGenModule.h
+++ b/clang/lib/CodeGen/CodeGenModule.h
@@ -335,7 +335,7 @@ class CodeGenModule : public CodeGenTypeCache {
   /// for emission and therefore should only be output if they are actually
   /// used. If a decl is in this, then it is known to have not been referenced
   /// yet.
-  std::map DeferredDecls;
+  llvm::DenseMap DeferredDecls;
 
   /// This is a list of deferred decls which we have seen that *are* actually
   /// referenced. These get code generated when the module is done.



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] fbe38a7 - [NFC][Lexer] Make access to LangOpts more consistent

2022-02-23 Thread Dawid Jurczak via cfe-commits

Author: Dawid Jurczak
Date: 2022-02-23T12:46:13+01:00
New Revision: fbe38a784e2852b22f5a44ad417e071ff583d57d

URL: 
https://github.com/llvm/llvm-project/commit/fbe38a784e2852b22f5a44ad417e071ff583d57d
DIFF: 
https://github.com/llvm/llvm-project/commit/fbe38a784e2852b22f5a44ad417e071ff583d57d.diff

LOG: [NFC][Lexer] Make access to LangOpts more consistent

Before this change without any good reason Lexer::LangOpts is sometimes 
accessed by getter and another time read directly in Lexer functions.
Since getLangOpts is a bit more verbose prefer direct access to LangOpts member 
when possible.

Differential Revision: https://reviews.llvm.org/D120333

Added: 


Modified: 
clang/lib/Lex/Lexer.cpp

Removed: 




diff  --git a/clang/lib/Lex/Lexer.cpp b/clang/lib/Lex/Lexer.cpp
index a180bba365cf..4f8910e7ac9e 100644
--- a/clang/lib/Lex/Lexer.cpp
+++ b/clang/lib/Lex/Lexer.cpp
@@ -1881,7 +1881,7 @@ bool Lexer::LexNumericConstant(Token &Result, const char 
*CurPtr) {
 if (!LangOpts.C99) {
   if (!isHexaLiteral(BufferPtr, LangOpts))
 IsHexFloat = false;
-  else if (!getLangOpts().CPlusPlus17 &&
+  else if (!LangOpts.CPlusPlus17 &&
std::find(BufferPtr, CurPtr, '_') != CurPtr)
 IsHexFloat = false;
 }
@@ -1890,12 +1890,12 @@ bool Lexer::LexNumericConstant(Token &Result, const 
char *CurPtr) {
   }
 
   // If we have a digit separator, continue.
-  if (C == '\'' && (getLangOpts().CPlusPlus14 || getLangOpts().C2x)) {
+  if (C == '\'' && (LangOpts.CPlusPlus14 || LangOpts.C2x)) {
 unsigned NextSize;
-char Next = getCharAndSizeNoWarn(CurPtr + Size, NextSize, getLangOpts());
+char Next = getCharAndSizeNoWarn(CurPtr + Size, NextSize, LangOpts);
 if (isAsciiIdentifierContinue(Next)) {
   if (!isLexingRawMode())
-Diag(CurPtr, getLangOpts().CPlusPlus
+Diag(CurPtr, LangOpts.CPlusPlus
  ? diag::warn_cxx11_compat_digit_separator
  : diag::warn_c2x_compat_digit_separator);
   CurPtr = ConsumeChar(CurPtr, Size, Result);
@@ -1921,7 +1921,7 @@ bool Lexer::LexNumericConstant(Token &Result, const char 
*CurPtr) {
 /// in C++11, or warn on a ud-suffix in C++98.
 const char *Lexer::LexUDSuffix(Token &Result, const char *CurPtr,
bool IsStringLiteral) {
-  assert(getLangOpts().CPlusPlus);
+  assert(LangOpts.CPlusPlus);
 
   // Maximally munch an identifier.
   unsigned Size;
@@ -1937,7 +1937,7 @@ const char *Lexer::LexUDSuffix(Token &Result, const char 
*CurPtr,
   return CurPtr;
   }
 
-  if (!getLangOpts().CPlusPlus11) {
+  if (!LangOpts.CPlusPlus11) {
 if (!isLexingRawMode())
   Diag(CurPtr,
C == '_' ? diag::warn_cxx11_compat_user_defined_literal
@@ -1955,7 +1955,7 @@ const char *Lexer::LexUDSuffix(Token &Result, const char 
*CurPtr,
 bool IsUDSuffix = false;
 if (C == '_')
   IsUDSuffix = true;
-else if (IsStringLiteral && getLangOpts().CPlusPlus14) {
+else if (IsStringLiteral && LangOpts.CPlusPlus14) {
   // In C++1y, we need to look ahead a few characters to see if this is a
   // valid suffix for a string literal or a numeric literal (this could be
   // the 'operator""if' defining a numeric literal operator).
@@ -1965,13 +1965,12 @@ const char *Lexer::LexUDSuffix(Token &Result, const 
char *CurPtr,
   unsigned Chars = 1;
   while (true) {
 unsigned NextSize;
-char Next = getCharAndSizeNoWarn(CurPtr + Consumed, NextSize,
- getLangOpts());
+char Next = getCharAndSizeNoWarn(CurPtr + Consumed, NextSize, 
LangOpts);
 if (!isAsciiIdentifierContinue(Next)) {
   // End of suffix. Check whether this is on the allowed list.
   const StringRef CompleteSuffix(Buffer, Chars);
-  IsUDSuffix = StringLiteralParser::isValidUDSuffix(getLangOpts(),
-CompleteSuffix);
+  IsUDSuffix =
+  StringLiteralParser::isValidUDSuffix(LangOpts, CompleteSuffix);
   break;
 }
 
@@ -1986,10 +1985,10 @@ const char *Lexer::LexUDSuffix(Token &Result, const 
char *CurPtr,
 
 if (!IsUDSuffix) {
   if (!isLexingRawMode())
-Diag(CurPtr, getLangOpts().MSVCCompat
+Diag(CurPtr, LangOpts.MSVCCompat
  ? diag::ext_ms_reserved_user_defined_literal
  : diag::ext_reserved_user_defined_literal)
-  << FixItHint::CreateInsertion(getSourceLocation(CurPtr), " ");
+<< FixItHint::CreateInsertion(getSourceLocation(CurPtr), " ");
   return CurPtr;
 }
 
@@ -2022,9 +2021,8 @@ bool Lexer::LexStringLiteral(Token &Result, const char 
*CurPtr,
   (Kind == tok::utf8_string_literal ||
Kind == tok::utf16_string_literal ||
Kind == tok::utf32_string_literal))
-Diag(BufferPtr

[clang] a64d3c6 - [NFC][Lexer] Make Lexer::LangOpts const reference

2022-02-28 Thread Dawid Jurczak via cfe-commits

Author: Dawid Jurczak
Date: 2022-02-28T15:42:19+01:00
New Revision: a64d3c602fb7e533855a64cb4d9fec77ea0a079e

URL: 
https://github.com/llvm/llvm-project/commit/a64d3c602fb7e533855a64cb4d9fec77ea0a079e
DIFF: 
https://github.com/llvm/llvm-project/commit/a64d3c602fb7e533855a64cb4d9fec77ea0a079e.diff

LOG: [NFC][Lexer] Make Lexer::LangOpts const reference

This change can be seen as code cleanup but motivation is more performance 
related.
While browsing perf reports captured during Linux build we can notice unusual 
portion of instructions executed in std::vector copy constructor 
like:

0.59% 0.58%  clang-14clang-14  [.] 
std::vector, 
std::allocator >,

std::allocator, 
std::allocator > > >::vector

or even:

1.42% 0.26%  clangclang-14 [.] 
clang::LangOptions::LangOptions
   |
--1.16%--clang::LangOptions::LangOptions
  |
   --0.74%--std::vector, std::allocator >,
std::allocator, std::allocator > > >::vector

After more digging we can see that relevant LangOptions std::vector members 
(*Files, ModuleFeatures and NoBuiltinFuncs)
are constructed when Lexer::LangOpts field is initialized on list:

Lexer::Lexer(..., const LangOptions &langOpts, ...)
: ..., LangOpts(langOpts),

Since LangOptions copy constructor is called by Lexer(..., const LangOptions 
&LangOpts,...) and local Lexer objects are created thousands times
(in Lexer::getRawToken, Preprocessor::EnterSourceFile and more) during single 
module processing in frontend it makes std::vector copy constructors 
surprisingly hot.

Unfortunately even though in current Lexer implementation mentioned std::vector 
members are unused and most of time empty,
no compiler is smart enough to optimize their std::vector copy constructors out 
(take a look at test assembly): https://godbolt.org/z/hdoxPfMYY even with LTO 
enabled.
However there is simple way to fix this. Since Lexer doesn't access *Files, 
ModuleFeatures, NoBuiltinFuncs and any other LangOptions fields (but only 
LangOptionsBase)
we can simply get rid of redundant copy constructor assembly by changing 
LangOpts type to more appropriate const LangOptions reference: 
https://godbolt.org/z/fP7de9176

Additionally we need to store LineComment outside LangOpts because it's written 
in SkipLineComment function.
Also FormatTokenLexer need to be adjusted a bit to avoid lifetime issues 
related to passing local LangOpts reference to Lexer.

After this change I can see more than 1% speedup in some of my microbenchmarks 
when using Clang release binary built with LTO.
For Linux build gains are not so significant but still nice at the level of 
-0.4%/-0.5% instructions drop.

Differential Revision: https://reviews.llvm.org/D120334

Added: 


Modified: 
clang/include/clang/Lex/Lexer.h
clang/lib/Format/FormatTokenLexer.cpp
clang/lib/Format/FormatTokenLexer.h
clang/lib/Lex/Lexer.cpp

Removed: 




diff  --git a/clang/include/clang/Lex/Lexer.h b/clang/include/clang/Lex/Lexer.h
index ba1706b1d13e0..e4dbb6b4af6f0 100644
--- a/clang/include/clang/Lex/Lexer.h
+++ b/clang/include/clang/Lex/Lexer.h
@@ -13,7 +13,6 @@
 #ifndef LLVM_CLANG_LEX_LEXER_H
 #define LLVM_CLANG_LEX_LEXER_H
 
-#include "clang/Basic/LangOptions.h"
 #include "clang/Basic/SourceLocation.h"
 #include "clang/Basic/TokenKinds.h"
 #include "clang/Lex/PreprocessorLexer.h"
@@ -36,6 +35,7 @@ namespace clang {
 class DiagnosticBuilder;
 class Preprocessor;
 class SourceManager;
+class LangOptions;
 
 /// ConflictMarkerKind - Kinds of conflict marker which the lexer might be
 /// recovering from.
@@ -90,8 +90,18 @@ class Lexer : public PreprocessorLexer {
   // Location for start of file.
   SourceLocation FileLoc;
 
-  // LangOpts enabled by this language (cache).
-  LangOptions LangOpts;
+  // LangOpts enabled by this language.
+  // Storing LangOptions as reference here is important from performance point
+  // of view. Lack of reference means that LangOptions copy constructor would 
be
+  // called by Lexer(..., const LangOptions &LangOpts,...). Given that local
+  // Lexer objects are created thousands times (in Lexer::getRawToken,
+  // Preprocessor::EnterSourceFile and other places) during single module
+  // processing in frontend it would make std::vector copy
+  // constructors surprisingly hot.
+  const LangOptions &LangOpts;
+
+  // True if '//' line comments are enabled.
+  bool LineComment;
 
   // True if lexer for _Pragma handling.
   bool Is_PragmaLexer;

diff  --git a/clang/lib/Format/FormatTokenLexer.cpp 
b/clang/lib/Format/FormatTokenLexer.cpp
index a48db4ef6d90f..187b30fd55a7e 100644
--- a/clang/lib/Format/FormatTokenLexer.cpp
+++ b/clang/lib/Format/FormatTokenLexer.cpp
@@ -28,13 +28,13 @@ FormatTokenLexer::FormatTokenLexer(
 llvm::SpecificBumpPtrAllocator &Alloc

[clang] b3e2dac - [NFC] Don't pass temporary LangOptions to Lexer

2022-02-28 Thread Dawid Jurczak via cfe-commits

Author: Dawid Jurczak
Date: 2022-02-28T20:43:28+01:00
New Revision: b3e2dac27c0cd4562e4ece5d5e24a1e59705c746

URL: 
https://github.com/llvm/llvm-project/commit/b3e2dac27c0cd4562e4ece5d5e24a1e59705c746
DIFF: 
https://github.com/llvm/llvm-project/commit/b3e2dac27c0cd4562e4ece5d5e24a1e59705c746.diff

LOG: [NFC] Don't pass temporary LangOptions to Lexer

Since https://reviews.llvm.org/D120334 we shouldn't pass temporary LangOptions 
to Lexer.
This change fixes stack-use-after-scope UB in LocalizationChecker found by 
sanitizer-x86_64-linux-fast buildbot
and resolve similar issue in HeaderIncludes.

Added: 


Modified: 
clang/lib/StaticAnalyzer/Checkers/LocalizationChecker.cpp
clang/lib/Tooling/Inclusions/HeaderIncludes.cpp

Removed: 




diff  --git a/clang/lib/StaticAnalyzer/Checkers/LocalizationChecker.cpp 
b/clang/lib/StaticAnalyzer/Checkers/LocalizationChecker.cpp
index b57c5dc6de562..361abf9b73493 100644
--- a/clang/lib/StaticAnalyzer/Checkers/LocalizationChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/LocalizationChecker.cpp
@@ -1145,8 +1145,8 @@ void 
EmptyLocalizationContextChecker::MethodCrawler::VisitObjCMessageExpr(
   Mgr.getSourceManager().getBufferOrNone(SLInfo.first, SL);
   if (!BF)
 return;
-
-  Lexer TheLexer(SL, LangOptions(), BF->getBufferStart(),
+  LangOptions LangOpts;
+  Lexer TheLexer(SL, LangOpts, BF->getBufferStart(),
  BF->getBufferStart() + SLInfo.second, BF->getBufferEnd());
 
   Token I;

diff  --git a/clang/lib/Tooling/Inclusions/HeaderIncludes.cpp 
b/clang/lib/Tooling/Inclusions/HeaderIncludes.cpp
index fbceb26c39c7c..fc8773e60c581 100644
--- a/clang/lib/Tooling/Inclusions/HeaderIncludes.cpp
+++ b/clang/lib/Tooling/Inclusions/HeaderIncludes.cpp
@@ -43,8 +43,9 @@ unsigned getOffsetAfterTokenSequence(
 GetOffsetAfterSequence) {
   SourceManagerForFile VirtualSM(FileName, Code);
   SourceManager &SM = VirtualSM.get();
+  LangOptions LangOpts = createLangOpts();
   Lexer Lex(SM.getMainFileID(), SM.getBufferOrFake(SM.getMainFileID()), SM,
-createLangOpts());
+LangOpts);
   Token Tok;
   // Get the first token.
   Lex.LexFromRawLexer(Tok);



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] d813116 - [NFC][Lexer] Remove getLangOpts function from Lexer

2022-03-02 Thread Dawid Jurczak via cfe-commits

Author: Dawid Jurczak
Date: 2022-03-02T11:17:05+01:00
New Revision: d813116c9deaa960ddcce5b4b161ea589d6e9a34

URL: 
https://github.com/llvm/llvm-project/commit/d813116c9deaa960ddcce5b4b161ea589d6e9a34
DIFF: 
https://github.com/llvm/llvm-project/commit/d813116c9deaa960ddcce5b4b161ea589d6e9a34.diff

LOG: [NFC][Lexer] Remove getLangOpts function from Lexer

Given that there is only one external user of Lexer::getLangOpts
we can remove getter entirely without much pain.

Differential Revision: https://reviews.llvm.org/D120404

Added: 


Modified: 
clang/include/clang/Lex/Lexer.h
clang/lib/Lex/Lexer.cpp
clang/lib/Lex/ModuleMap.cpp

Removed: 




diff  --git a/clang/include/clang/Lex/Lexer.h b/clang/include/clang/Lex/Lexer.h
index e4dbb6b4af6f0..c64a5756ac419 100644
--- a/clang/include/clang/Lex/Lexer.h
+++ b/clang/include/clang/Lex/Lexer.h
@@ -183,10 +183,6 @@ class Lexer : public PreprocessorLexer {
SourceLocation ExpansionLocEnd,
unsigned TokLen, Preprocessor &PP);
 
-  /// getLangOpts - Return the language features currently enabled.
-  /// NOTE: this lexer modifies features as a file is parsed!
-  const LangOptions &getLangOpts() const { return LangOpts; }
-
   /// getFileLoc - Return the File Location for the file we are lexing out of.
   /// The physical location encodes the location where the characters come 
from,
   /// the virtual location encodes where we should *claim* the characters came

diff  --git a/clang/lib/Lex/Lexer.cpp b/clang/lib/Lex/Lexer.cpp
index 63ad43842adde..6e8072fb1b2d9 100644
--- a/clang/lib/Lex/Lexer.cpp
+++ b/clang/lib/Lex/Lexer.cpp
@@ -1194,11 +1194,11 @@ static char GetTrigraphCharForLetter(char Letter) {
 /// prefixed with ??, emit a trigraph warning.  If trigraphs are enabled,
 /// return the result character.  Finally, emit a warning about trigraph use
 /// whether trigraphs are enabled or not.
-static char DecodeTrigraphChar(const char *CP, Lexer *L) {
+static char DecodeTrigraphChar(const char *CP, Lexer *L, bool Trigraphs) {
   char Res = GetTrigraphCharForLetter(*CP);
   if (!Res || !L) return Res;
 
-  if (!L->getLangOpts().Trigraphs) {
+  if (!Trigraphs) {
 if (!L->isLexingRawMode())
   L->Diag(CP-2, diag::trigraph_ignored);
 return 0;
@@ -1372,7 +1372,8 @@ char Lexer::getCharAndSizeSlow(const char *Ptr, unsigned 
&Size,
   if (Ptr[0] == '?' && Ptr[1] == '?') {
 // If this is actually a legal trigraph (not something like "??x"), emit
 // a trigraph warning.  If so, and if trigraphs are enabled, return it.
-if (char C = DecodeTrigraphChar(Ptr+2, Tok ? this : nullptr)) {
+if (char C = DecodeTrigraphChar(Ptr + 2, Tok ? this : nullptr,
+LangOpts.Trigraphs)) {
   // Remember that this token needs to be cleaned.
   if (Tok) Tok->setFlag(Token::NeedsCleaning);
 
@@ -2543,8 +2544,8 @@ bool Lexer::SaveLineComment(Token &Result, const char 
*CurPtr) {
 /// isBlockCommentEndOfEscapedNewLine - Return true if the specified newline
 /// character (either \\n or \\r) is part of an escaped newline sequence.  
Issue
 /// a diagnostic if so.  We know that the newline is inside of a block comment.
-static bool isEndOfBlockCommentWithEscapedNewLine(const char *CurPtr,
-  Lexer *L) {
+static bool isEndOfBlockCommentWithEscapedNewLine(const char *CurPtr, Lexer *L,
+  bool Trigraphs) {
   assert(CurPtr[0] == '\n' || CurPtr[0] == '\r');
 
   // Position of the first trigraph in the ending sequence.
@@ -2595,7 +2596,7 @@ static bool isEndOfBlockCommentWithEscapedNewLine(const 
char *CurPtr,
   if (TrigraphPos) {
 // If no trigraphs are enabled, warn that we ignored this trigraph and
 // ignore this * character.
-if (!L->getLangOpts().Trigraphs) {
+if (!Trigraphs) {
   if (!L->isLexingRawMode())
 L->Diag(TrigraphPos, diag::trigraph_ignored_block_comment);
   return false;
@@ -2725,7 +2726,8 @@ bool Lexer::SkipBlockComment(Token &Result, const char 
*CurPtr,
 break;
 
   if ((CurPtr[-2] == '\n' || CurPtr[-2] == '\r')) {
-if (isEndOfBlockCommentWithEscapedNewLine(CurPtr-2, this)) {
+if (isEndOfBlockCommentWithEscapedNewLine(CurPtr - 2, this,
+  LangOpts.Trigraphs)) {
   // We found the final */, though it had an escaped newline between 
the
   // * and /.  We're done!
   break;

diff  --git a/clang/lib/Lex/ModuleMap.cpp b/clang/lib/Lex/ModuleMap.cpp
index 824b2bb192909..a5eca402c43bf 100644
--- a/clang/lib/Lex/ModuleMap.cpp
+++ b/clang/lib/Lex/ModuleMap.cpp
@@ -1625,7 +1625,7 @@ SourceLocation ModuleMapParser::consumeToken() {
 SpellingBuffer.resize(LToken.getLength() + 1);
 const char *Start = SpellingBuffer.data();
 un