rsmith added inline comments.
================ Comment at: clang/include/clang/Lex/Preprocessor.h:145 ModuleLoader &TheModuleLoader; + LiteralConverter LT; ---------------- Please give this a longer name. Abbreviation names should only be used in fairly small scopes where it's easy to look up what they refer to. Also: why `LT`? What does the `T` stand for? ================ Comment at: clang/lib/Driver/ToolChains/Clang.cpp:6178 + CmdArgs.push_back(Args.MakeArgString(Triple.getSystemCharset())); + for (auto it = vList.begin(), ie = vList.end(); it != ie; ++it) { + llvm::ErrorOr<llvm::CharSetConverter> ErrorOrConverter = ---------------- Looping over all the arguments is a little unusual. Normally we'd get the last argument value and only check that one. Do you need to pass more than one value onto the frontend? ================ Comment at: clang/lib/Lex/LiteralSupport.cpp:237 + SmallString<8> ResultCharConv; + Converter->convert(std::string(1, ByteChar), ResultCharConv); + assert(ResultCharConv.size() == 1 && ---------------- Can you avoid using `std::string` here? Eg, pass a `StringRef`, extending the converter to be able to take one if necessary. ================ Comment at: clang/lib/Lex/LiteralSupport.cpp:1363-1364 + if (!HadError && Converter) { + assert(Kind != tok::wide_char_constant && + "Wide character translation not supported"); + SmallString<1> ConvertedChar; ---------------- Why is this case not possible? ================ Comment at: clang/lib/Lex/LiteralSupport.cpp:1367 + Converter->convert(StringRef((char *)tmp_out_start), ConvertedChar); + memmove((void *)tmp_out_start, ConvertedChar.data(), 1); + } ---------------- What assurance do we have that 1 output character is correct? I would expect we need to reject with a diagnostic if the character doesn't fit in one converted character. ================ Comment at: clang/lib/Lex/LiteralSupport.cpp:1700-1702 // Point into the \n inside the \r\n sequence and operate on the // remaining portion of the literal. RemainingTokenSpan = AfterCRLF.substr(1); ---------------- Do we need to convert the newline character too? Perhaps for raw string literals it'd be better to do the normal processing here and then convert the entire string at once? ================ Comment at: clang/lib/Lex/LiteralSupport.cpp:234 + SmallString<8> ResultCharConv; + Converter->convert(std::string(1, ByteChar), ResultCharConv); + memcpy((void *)&ResultChar, ResultCharConv.data(), sizeof(unsigned)); ---------------- abhina.sreeskantharajan wrote: > tahonermann wrote: > > Conversion can fail here, particularly in the scenario corresponding to the > > default switch case above; `ResultChar` could contain, for example, a lead > > byte of a UTF-8 sequence. Something sensible should be done here; either > > rejecting the code with an error or substituting `?` (in the execution > > encoding) seems appropriate to me. > Thanks, I added the substitution with the '?' character for invalid escapes. This is a regression. Our prior behavior for unknown escapes was to leave the character alone. We should still do that wherever possible -- eg, `\q` should produce `q` -- and take fallback action only if the character is unencodable. Producing a `?` seems unlikely to ever be what anyone wants; producing a hard error would seem preferable. ================ Comment at: clang/lib/Lex/LiteralSupport.cpp:238 + void *Pointer = &ResultChar; + memcpy(Pointer, ResultCharConv.data(), sizeof(unsigned)); + } ---------------- abhina.sreeskantharajan wrote: > abhina.sreeskantharajan wrote: > > rsmith wrote: > > > What should happen if the result doesn't fit into an `unsigned`? This > > > also appears to be making problematic assumptions about the endianness of > > > the host. If we really want to pack multiple bytes of encoded output into > > > a single `unsigned` result value (which itself seems dubious), we should > > > do so with an endianness that doesn't depend on the host. > > This may be a problem we need to revisit since ResultChar is expecting a > > char. > I added an assertion for this case where the size of the character increases > after translation. I've also removed the memcpy to avoid endianness issues. Is there any guarantee the assertion will not fail? ================ Comment at: clang/test/Driver/cl-options.c:214 +// RUN: %clang_cl /execution-charset:invalid-charset -### -- %s 2>&1 | FileCheck -check-prefix=execution-charset-invalid %s +// execution-charset-invalid: invalid value 'invalid-charset' in '-fexec-charset' // ---------------- Please use the given spelling of the flag in the diagnostic. (You can ask the argument how it was spelled.) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D93031/new/ https://reviews.llvm.org/D93031 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits