Author: Thibault Monnier
Date: 2026-02-26T19:56:47+01:00
New Revision: f6917fa6c90a9d55d3bfd9d8aa4583834771565d

URL: 
https://github.com/llvm/llvm-project/commit/f6917fa6c90a9d55d3bfd9d8aa4583834771565d
DIFF: 
https://github.com/llvm/llvm-project/commit/f6917fa6c90a9d55d3bfd9d8aa4583834771565d.diff

LOG: [Clang][Lexer][Performance] Optimize Lexer whitespace skipping logic 
(#180819)

... by extracting the check for space character and marking it as
`LLVM_LIKELY`. This increases performance because the space is by far
the most common horizontal character, so in most cases, this change
allows to replace a lookup table check with a simple comparison,
reducing latency and helping the cache.

This does not reduce instruction count, as a lookup table and a
comparison are both a single instruction. However, it _does_ reduce
cycles in a consistent manner, around `0.2` - `0.3`%:
[benchmark](https://llvm-compile-time-tracker.com/compare.php?from=3192fe2c7b08912cc72c86471a593165b615dc28&to=faa899a6ce518c1176f2bf59f199eb42e59d840e&stat=cycles).
I tested this locally and am able to confirm this is not noise (at least
not entirely, it does feel weird that this impacts `O3` more than
`O0`...), as I achieved almost `2`% faster PP speed in my tests.

Added: 
    

Modified: 
    clang/lib/Lex/Lexer.cpp

Removed: 
    


################################################################################
diff  --git a/clang/lib/Lex/Lexer.cpp b/clang/lib/Lex/Lexer.cpp
index 92c3046a6fd19..b100fc29fcd69 100644
--- a/clang/lib/Lex/Lexer.cpp
+++ b/clang/lib/Lex/Lexer.cpp
@@ -2533,8 +2533,8 @@ bool Lexer::SkipWhitespace(Token &Result, const char 
*CurPtr) {
 
   // Skip consecutive spaces efficiently.
   while (true) {
-    // Skip horizontal whitespace very aggressively.
-    while (isHorizontalWhitespace(Char))
+    // Skip horizontal whitespace, especially space, very aggressively.
+    while (Char == ' ' || isHorizontalWhitespace(Char))
       Char = *++CurPtr;
 
     // Otherwise if we have something other than whitespace, we're done.
@@ -3757,10 +3757,12 @@ bool Lexer::LexTokenInternal(Token &Result) {
   const char *CurPtr = BufferPtr;
 
   // Small amounts of horizontal whitespace is very common between tokens.
-  if (isHorizontalWhitespace(*CurPtr)) {
+  // Check for space character separately to skip the expensive
+  // isHorizontalWhitespace() check
+  if (*CurPtr == ' ' || isHorizontalWhitespace(*CurPtr)) {
     do {
       ++CurPtr;
-    } while (isHorizontalWhitespace(*CurPtr));
+    } while (*CurPtr == ' ' || isHorizontalWhitespace(*CurPtr));
 
     // If we are keeping whitespace and other tokens, just return what we just
     // skipped.  The next lexer invocation will return the token after the


        
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to