cor3ntin added a comment. This is an impressive amount of work. I think it makes sense! Thanks a lot for doing that work. I only have a few nits after a first review of this.
================ Comment at: clang/include/clang/Lex/Lexer.h:89 - // End of the buffer. - const char *BufferEnd; + // Size of the buffer. + unsigned BufferSize; ---------------- Should that use `SourceLocation::UIntTy`? Looking at comments in SourceManager, I think there was an attempt at supporting > 2GB file but I don't think it got anywhere. Nevertheless, using `SourceLocation::UIntTy` would arguably be more consistent It does seem to be a huge undertaking to change it though, I'm not sure it would be worth it at all. There would be far bigger issues with ridiculously large source files anyway. ================ Comment at: clang/include/clang/Lex/Lexer.h:609 - bool CheckUnicodeWhitespace(Token &Result, uint32_t C, const char *CurPtr); + bool CheckUnicodeWhitespace(Token &Result, uint32_t C, unsigned CurOffset); ---------------- davrec wrote: > FWIW it sucks that `uint32_t` is already sprinkled throughout the interface > alongside `unsigned`, wish just one was used consistently, but that does not > need to be addressed in this patch. here, `uint32_t` is a codepoint, should arguably be `char32_t` instead. but agreed, not in this patch. ================ Comment at: clang/lib/Lex/Lexer.cpp:213 + L->BufferSize = L->BufferOffset + TokLen; + assert(L->BufferStart[L->BufferSize] == 0 && "Buffer is not nul terminated!"); ---------------- ================ Comment at: clang/lib/Lex/Lexer.cpp:1203 if (L && !L->isLexingRawMode()) - L->Diag(CP-2, diag::trigraph_ignored); + L->Diag(CP - 2 - L->getBuffer().data(), diag::trigraph_ignored); return 0; ---------------- sunho wrote: > shafik wrote: > > I wonder do we really need to do these pointer gymnastics, maybe making > > this a member function would eliminate the need for it. > Yes, we can change it to offset here. Agreed, that would be nice! (In all places `getBuffer().data()` is used) ================ Comment at: clang/lib/Lex/Lexer.cpp:1353 + if (unsigned EscapedNewLineSize = + getEscapedNewLineSize(&BufferStart[Offset])) { // Remember that this token needs to be cleaned. ---------------- ================ Comment at: clang/lib/Lex/Lexer.cpp:1378 // a trigraph warning. If so, and if trigraphs are enabled, return it. - if (char C = DecodeTrigraphChar(Ptr + 2, Tok ? this : nullptr, - LangOpts.Trigraphs)) { + if (char C = DecodeTrigraphChar(&BufferStart[Offset + 2], + Tok ? this : nullptr, LangOpts.Trigraphs)) { ---------------- ================ Comment at: clang/lib/Lex/Lexer.cpp:1740 +bool Lexer::tryConsumeIdentifierUTF8Char(unsigned &CurOffset) { + const char *UnicodePtr = &BufferStart[CurOffset]; llvm::UTF32 CodePoint; ---------------- ================ Comment at: clang/lib/Lex/Lexer.cpp:1744 llvm::convertUTF8Sequence((const llvm::UTF8 **)&UnicodePtr, - (const llvm::UTF8 *)BufferEnd, - &CodePoint, - llvm::strictConversion); + (const llvm::UTF8 *)&BufferStart[BufferSize], + &CodePoint, llvm::strictConversion); ---------------- Ditto in all similar places, I think it reads easier Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D143142/new/ https://reviews.llvm.org/D143142 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits