davrec added a comment. Suggested a few adjustments in `LexTokenInternal` you might want to test for perf improvements (orthogonal to this patch, but could boost its numbers :). And also noted that `Lexer::getBuffer()` has same issue as `getBufferLocation()` re potential for pointer invalidation IIUC. At a minimum we need comments on these functions explaining any risks; better still to remove them from the public interface. If downstream users use these functions and complain, good - they need to be aware of this change.
================ Comment at: clang/include/clang/Lex/Lexer.h:285 /// Gets source code buffer. - StringRef getBuffer() const { - return StringRef(BufferStart, BufferEnd - BufferStart); - } + StringRef getBuffer() const { return StringRef(BufferStart, BufferSize); } ---------------- Same issue as with `getBufferLocation()`, publicly returning it permits possible pointer invalidation. Fortunately I only see it used in a single spot (prior to this patch anyway) which can be easily eliminated IIUC. Yank this function? Or make private/append "Unsafe" to name (and explain in comments)? ================ Comment at: clang/include/clang/Lex/Lexer.h:307 /// Return the current location in the buffer. - const char *getBufferLocation() const { return BufferPtr; } + const char *getBufferLocation() const { + assert(BufferOffset <= BufferSize && "Invalid buffer state"); ---------------- davrec wrote: > I think I'd like this to return `unsigned`; i.e. I think after this patch we > should not even be publicly exposing buffer locations as pointers, IIUC. A > brief search for uses of `getBufferLocation()` (there aren't many) suggests > this would be probably be fine and indeed would get rid of some unnecessary > pointer arithmetic. And indeed if anything really needs that `const char *` > that might be a red flag to investigate further. Looking more closely I see that `getCurrentBufferOffset` returns the unsigned, and this patch already changes some `getBufferLocation` usages to `getCurrentBufferOffset`. So, I say either yank it or make it private or append "Unsafe" and explain in comments. ================ Comment at: clang/lib/Lex/Lexer.cpp:2948-2949 + if (isHorizontalWhitespace(BufferStart[CurOffset])) { + SkipWhitespace(Result, CurOffset + 1, TokAtPhysicalStartOfLine); + return false; } ---------------- indent ================ Comment at: clang/lib/Lex/Lexer.cpp:2973-2975 + char Char = getAndAdvanceChar(CurOffset, Tmp); + switch (Char) { + default: ---------------- indent ================ Comment at: clang/lib/Lex/Lexer.cpp:3630-3632 do { - ++CurPtr; - } while (isHorizontalWhitespace(*CurPtr)); + ++CurOffset; + } while (isHorizontalWhitespace(BufferStart[CurOffset])); ---------------- ``` for (isHorizontalWhitespace(BufferStart[++CurOffset]);;) ; ``` might save a few instructions? Worth trying since this function is the main perf-critical one. ================ Comment at: clang/lib/Lex/Lexer.cpp:3739-3752 + if (BufferStart[CurOffset] == '/' && BufferStart[CurOffset + 1] == '/' && + !inKeepCommentMode() && LineComment && + (LangOpts.CPlusPlus || !LangOpts.TraditionalCPP)) { + if (SkipLineComment(Result, CurOffset + 2, TokAtPhysicalStartOfLine)) return true; // There is a token to return. goto SkipIgnoredUnits; + } else if (BufferStart[CurOffset] == '/' && ---------------- Spitballing again for possible minor perf improvements: ``` if (char Char0 = BufferStart[CurOffset] == '/' && !inKeepCommentMode()) { if (char Char1 = BufferStart[CurOffset + 1] == '/' && LineComment && (LangOpts.CPlusPlus || !LangOpts.TraditionalCPP)) { if (SkipLineComment(Result, CurOffset + 2, TokAtPhysicalStartOfLine)) return true; // There is a token to return. goto SkipIgnoredUnits; } else if (Char1 == '*') { if (SkipBlockComment(Result, CurOffset + 2, TokAtPhysicalStartOfLine)) return true; // There is a token to return. goto SkipIgnoredUnits; } } else if (isHorizontalWhitespace(Char0)) { goto SkipHorizontalWhitespace; } ``` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D143142/new/ https://reviews.llvm.org/D143142 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits