tahonermann requested changes to this revision. tahonermann added a comment. This revision now requires changes to proceed.
Splitting this into two patches (one to do the renames, another to perform the other changes) would simplify review. ================ Comment at: clang/lib/Frontend/TextDiagnostic.cpp:118-119 - unsigned char const *begin, *end; - begin = reinterpret_cast<unsigned char const *>(&*(SourceLine.begin() + *i)); - end = begin + (SourceLine.size() - *i); - - if (llvm::isLegalUTF8Sequence(begin, end)) { - llvm::UTF32 c; - llvm::UTF32 *cptr = &c; - unsigned char const *original_begin = begin; - unsigned char const *cp_end = - begin + llvm::getNumBytesForUTF8(SourceLine[*i]); - - llvm::ConversionResult res = llvm::ConvertUTF8toUTF32( - &begin, cp_end, &cptr, cptr + 1, llvm::strictConversion); - (void)res; - assert(llvm::conversionOK == res); - assert(0 < begin-original_begin - && "we must be further along in the string now"); - *i += begin-original_begin; - - if (!llvm::sys::locale::isPrint(c)) { - // If next character is valid UTF-8, but not printable - SmallString<16> expandedCP("<U+>"); - while (c) { - expandedCP.insert(expandedCP.begin()+3, llvm::hexdigit(c%16)); - c/=16; - } - while (expandedCP.size() < 8) - expandedCP.insert(expandedCP.begin()+3, llvm::hexdigit(0)); - return std::make_pair(expandedCP, false); - } + const unsigned char *Begin = + reinterpret_cast<unsigned char const *>(&*(SourceLine.begin())) + (*I); + unsigned CharSize = llvm::getNumBytesForUTF8(*Begin); ---------------- ================ Comment at: clang/lib/Frontend/TextDiagnostic.cpp:120 - begin = reinterpret_cast<unsigned char const *>(&*(SourceLine.begin() + *i)); - end = begin + (SourceLine.size() - *i); - ---------------- tbaeder wrote: > I don't know what this computation of `end` means, but from the debug output > I added, it meant that a call to this function always converted the entire > line. I think. It looks to me like an overly complicated way of asking for `SourceLine.bytes_end()`. ================ Comment at: clang/lib/Frontend/TextDiagnostic.cpp:121 + unsigned CharSize = llvm::getNumBytesForUTF8(*Begin); + const unsigned char *End = Begin + CharSize; ---------------- This could assign `End` to a location past the end of the source line when the source line ends with an ill-formed code unit sequence (e.g., a truncated 4-byte sequence). Constructing such a pointer is likely to get one in trouble. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D150843/new/ https://reviews.llvm.org/D150843 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits