llvmbot wrote:

<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-clang

Author: Thibault Monnier (Thibault-Monnier)

<details>
<summary>Changes</summary>

This change attempts to maximize usage of the SSE fast path in 
`fastParseASCIIIdentifier`.

If the binary is compiled with SSE4.2 enabled, then the behavior is the exact 
same, ensuring we have no regressions.

If not, we compile both the SSE fast path and the scalar loop. At runtime, we 
check if SSE4.2 is available by using `__builtin_cpu_supports`, and dispatch to 
the right function. If it _is_ available, this allows a net performance 
improvement. Otherwise, there's a very slight but negligible regression... I 
believe that's perfectly reasonable for a non-SSE4.2-supporting processor.

I checked locally on an old processor with QEMU to ensure this doesn't break 
compatibility; I'll need help to write the real tests though.

The benchmark results are available at 
[llvm-compile-time-tracker](https://llvm-compile-time-tracker.com/compare.php?from=f88d060c4176d17df56587a083944637ca865cb3&amp;to=c2cf8e936d72720eafd32d3b25e85a784112b226&amp;stat=instructions%3Au).

---
Full diff: https://github.com/llvm/llvm-project/pull/171914.diff


1 Files Affected:

- (modified) clang/lib/Lex/Lexer.cpp (+26-9) 


``````````diff
diff --git a/clang/lib/Lex/Lexer.cpp b/clang/lib/Lex/Lexer.cpp
index b282a600c0e56f..3b8fa0b9b7f364 100644
--- a/clang/lib/Lex/Lexer.cpp
+++ b/clang/lib/Lex/Lexer.cpp
@@ -46,9 +46,7 @@
 #include <string>
 #include <tuple>
 
-#ifdef __SSE4_2__
 #include <nmmintrin.h>
-#endif
 
 using namespace clang;
 
@@ -1921,9 +1919,17 @@ bool Lexer::LexUnicodeIdentifierStart(Token &Result, 
uint32_t C,
 }
 
 static const char *
-fastParseASCIIIdentifier(const char *CurPtr,
-                         [[maybe_unused]] const char *BufferEnd) {
-#ifdef __SSE4_2__
+fastParseASCIIIdentifierScalar(const char *CurPtr,
+                               [[maybe_unused]] const char *BufferEnd) {
+  unsigned char C = *CurPtr;
+  while (isAsciiIdentifierContinue(C))
+    C = *++CurPtr;
+  return CurPtr;
+}
+
+__attribute__((target("sse4.2"))) static const char *
+fastParseASCIIIdentifierSSE42(const char *CurPtr,
+                              [[maybe_unused]] const char *BufferEnd) {
   alignas(16) static constexpr char AsciiIdentifierRange[16] = {
       '_', '_', 'A', 'Z', 'a', 'z', '0', '9',
   };
@@ -1943,12 +1949,23 @@ fastParseASCIIIdentifier(const char *CurPtr,
       continue;
     return CurPtr;
   }
+
+  return fastParseASCIIIdentifierScalar(CurPtr, BufferEnd);
+}
+
+static bool supportsSSE42() {
+  static bool SupportsSSE42 = __builtin_cpu_supports("sse4.2");
+  return SupportsSSE42;
+}
+
+static const char *fastParseASCIIIdentifier(const char *CurPtr,
+                                            const char *BufferEnd) {
+#ifndef __SSE4_2__
+  if (LLVM_UNLIKELY(!supportsSSE42()))
+    return fastParseASCIIIdentifierScalar(CurPtr, BufferEnd);
 #endif
 
-  unsigned char C = *CurPtr;
-  while (isAsciiIdentifierContinue(C))
-    C = *++CurPtr;
-  return CurPtr;
+  return fastParseASCIIIdentifierSSE42(CurPtr, BufferEnd);
 }
 
 bool Lexer::LexIdentifierContinue(Token &Result, const char *CurPtr) {

``````````

</details>


https://github.com/llvm/llvm-project/pull/171914
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to