[PATCH] D93031: Enable fexec-charset option

Richard Smith - zygoloid via Phabricator via cfe-commits Tue, 02 Mar 2021 12:12:53 -0800

rsmith added inline comments.


================
Comment at: clang/include/clang/Lex/Preprocessor.h:145
   ModuleLoader      &TheModuleLoader;
+  LiteralConverter  LT;
 
----------------
Please give this a longer name. Abbreviation names should only be used in 
fairly small scopes where it's easy to look up what they refer to.

Also: why `LT`? What does the `T` stand for?


================
Comment at: clang/lib/Driver/ToolChains/Clang.cpp:6178
+  CmdArgs.push_back(Args.MakeArgString(Triple.getSystemCharset()));
+  for (auto it = vList.begin(), ie = vList.end(); it != ie; ++it) {
+    llvm::ErrorOr<llvm::CharSetConverter> ErrorOrConverter =
----------------
Looping over all the arguments is a little unusual. Normally we'd get the last 
argument value and only check that one. Do you need to pass more than one value 
onto the frontend?


================
Comment at: clang/lib/Lex/LiteralSupport.cpp:237
+    SmallString<8> ResultCharConv;
+    Converter->convert(std::string(1, ByteChar), ResultCharConv);
+    assert(ResultCharConv.size() == 1 &&
----------------
Can you avoid using `std::string` here? Eg, pass a `StringRef`, extending the 
converter to be able to take one if necessary.


================
Comment at: clang/lib/Lex/LiteralSupport.cpp:1363-1364
+          if (!HadError && Converter) {
+            assert(Kind != tok::wide_char_constant &&
+                   "Wide character translation not supported");
+            SmallString<1> ConvertedChar;
----------------
Why is this case not possible?


================
Comment at: clang/lib/Lex/LiteralSupport.cpp:1367
+            Converter->convert(StringRef((char *)tmp_out_start), 
ConvertedChar);
+            memmove((void *)tmp_out_start, ConvertedChar.data(), 1);
+          }
----------------
What assurance do we have that 1 output character is correct? I would expect we 
need to reject with a diagnostic if the character doesn't fit in one converted 
character.


================
Comment at: clang/lib/Lex/LiteralSupport.cpp:1700-1702
         // Point into the \n inside the \r\n sequence and operate on the
         // remaining portion of the literal.
         RemainingTokenSpan = AfterCRLF.substr(1);
----------------
Do we need to convert the newline character too?

Perhaps for raw string literals it'd be better to do the normal processing here 
and then convert the entire string at once?


================
Comment at: clang/lib/Lex/LiteralSupport.cpp:234
+    SmallString<8> ResultCharConv;
+    Converter->convert(std::string(1, ByteChar), ResultCharConv);
+    memcpy((void *)&ResultChar, ResultCharConv.data(), sizeof(unsigned));
----------------
abhina.sreeskantharajan wrote:
> tahonermann wrote:
> > Conversion can fail here, particularly in the scenario corresponding to the 
> > default switch case above; `ResultChar` could contain, for example, a lead 
> > byte of a UTF-8 sequence.  Something sensible should be done here; either 
> > rejecting the code with an error or substituting `?` (in the execution 
> > encoding) seems appropriate to me.
> Thanks, I added the substitution with the '?' character for invalid escapes.
This is a regression. Our prior behavior for unknown escapes was to leave the 
character alone. We should still do that wherever possible -- eg, `\q` should 
produce `q` -- and take fallback action only if the character is unencodable. 
Producing a `?` seems unlikely to ever be what anyone wants; producing a hard 
error would seem preferable.


================
Comment at: clang/lib/Lex/LiteralSupport.cpp:238
+      void *Pointer = &ResultChar;
+      memcpy(Pointer, ResultCharConv.data(), sizeof(unsigned));
+    }
----------------
abhina.sreeskantharajan wrote:
> abhina.sreeskantharajan wrote:
> > rsmith wrote:
> > > What should happen if the result doesn't fit into an `unsigned`? This 
> > > also appears to be making problematic assumptions about the endianness of 
> > > the host. If we really want to pack multiple bytes of encoded output into 
> > > a single `unsigned` result value (which itself seems dubious), we should 
> > > do so with an endianness that doesn't depend on the host.
> > This may be a problem we need to revisit since ResultChar is expecting a 
> > char.
> I added an assertion for this case where the size of the character increases 
> after translation. I've also removed the memcpy to avoid endianness issues.
Is there any guarantee the assertion will not fail?


================
Comment at: clang/test/Driver/cl-options.c:214
+// RUN: %clang_cl /execution-charset:invalid-charset -### -- %s 2>&1 | 
FileCheck -check-prefix=execution-charset-invalid %s
+// execution-charset-invalid: invalid value 'invalid-charset' in 
'-fexec-charset'
 //
----------------
Please use the given spelling of the flag in the diagnostic. (You can ask the 
argument how it was spelled.)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93031/new/

https://reviews.llvm.org/D93031

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D93031: Enable fexec-charset option

Reply via email to