rsmith added inline comments.
================
Comment at: clang/include/clang/Lex/Preprocessor.h:145
ModuleLoader &TheModuleLoader;
+ LiteralConverter LT;
----------------
Please give this a longer name. Abbreviation names should only be used in
fairly small scopes where it's easy to look up what they refer to.
Also: why `LT`? What does the `T` stand for?
================
Comment at: clang/lib/Driver/ToolChains/Clang.cpp:6178
+ CmdArgs.push_back(Args.MakeArgString(Triple.getSystemCharset()));
+ for (auto it = vList.begin(), ie = vList.end(); it != ie; ++it) {
+ llvm::ErrorOr<llvm::CharSetConverter> ErrorOrConverter =
----------------
Looping over all the arguments is a little unusual. Normally we'd get the last
argument value and only check that one. Do you need to pass more than one value
onto the frontend?
================
Comment at: clang/lib/Lex/LiteralSupport.cpp:237
+ SmallString<8> ResultCharConv;
+ Converter->convert(std::string(1, ByteChar), ResultCharConv);
+ assert(ResultCharConv.size() == 1 &&
----------------
Can you avoid using `std::string` here? Eg, pass a `StringRef`, extending the
converter to be able to take one if necessary.
================
Comment at: clang/lib/Lex/LiteralSupport.cpp:1363-1364
+ if (!HadError && Converter) {
+ assert(Kind != tok::wide_char_constant &&
+ "Wide character translation not supported");
+ SmallString<1> ConvertedChar;
----------------
Why is this case not possible?
================
Comment at: clang/lib/Lex/LiteralSupport.cpp:1367
+ Converter->convert(StringRef((char *)tmp_out_start),
ConvertedChar);
+ memmove((void *)tmp_out_start, ConvertedChar.data(), 1);
+ }
----------------
What assurance do we have that 1 output character is correct? I would expect we
need to reject with a diagnostic if the character doesn't fit in one converted
character.
================
Comment at: clang/lib/Lex/LiteralSupport.cpp:1700-1702
// Point into the \n inside the \r\n sequence and operate on the
// remaining portion of the literal.
RemainingTokenSpan = AfterCRLF.substr(1);
----------------
Do we need to convert the newline character too?
Perhaps for raw string literals it'd be better to do the normal processing here
and then convert the entire string at once?
================
Comment at: clang/lib/Lex/LiteralSupport.cpp:234
+ SmallString<8> ResultCharConv;
+ Converter->convert(std::string(1, ByteChar), ResultCharConv);
+ memcpy((void *)&ResultChar, ResultCharConv.data(), sizeof(unsigned));
----------------
abhina.sreeskantharajan wrote:
> tahonermann wrote:
> > Conversion can fail here, particularly in the scenario corresponding to the
> > default switch case above; `ResultChar` could contain, for example, a lead
> > byte of a UTF-8 sequence. Something sensible should be done here; either
> > rejecting the code with an error or substituting `?` (in the execution
> > encoding) seems appropriate to me.
> Thanks, I added the substitution with the '?' character for invalid escapes.
This is a regression. Our prior behavior for unknown escapes was to leave the
character alone. We should still do that wherever possible -- eg, `\q` should
produce `q` -- and take fallback action only if the character is unencodable.
Producing a `?` seems unlikely to ever be what anyone wants; producing a hard
error would seem preferable.
================
Comment at: clang/lib/Lex/LiteralSupport.cpp:238
+ void *Pointer = &ResultChar;
+ memcpy(Pointer, ResultCharConv.data(), sizeof(unsigned));
+ }
----------------
abhina.sreeskantharajan wrote:
> abhina.sreeskantharajan wrote:
> > rsmith wrote:
> > > What should happen if the result doesn't fit into an `unsigned`? This
> > > also appears to be making problematic assumptions about the endianness of
> > > the host. If we really want to pack multiple bytes of encoded output into
> > > a single `unsigned` result value (which itself seems dubious), we should
> > > do so with an endianness that doesn't depend on the host.
> > This may be a problem we need to revisit since ResultChar is expecting a
> > char.
> I added an assertion for this case where the size of the character increases
> after translation. I've also removed the memcpy to avoid endianness issues.
Is there any guarantee the assertion will not fail?
================
Comment at: clang/test/Driver/cl-options.c:214
+// RUN: %clang_cl /execution-charset:invalid-charset -### -- %s 2>&1 |
FileCheck -check-prefix=execution-charset-invalid %s
+// execution-charset-invalid: invalid value 'invalid-charset' in
'-fexec-charset'
//
----------------
Please use the given spelling of the flag in the diagnostic. (You can ask the
argument how it was spelled.)
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D93031/new/
https://reviews.llvm.org/D93031
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits