SebastianPerta created this revision.
SebastianPerta added reviewers: aaron.ballman, sammccall, DaanDeMeyer.
Herald added a subscriber: dylanmckay.
Herald added a project: All.
SebastianPerta requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

On 16 bit architectures char32_t literals are truncated, for example 
U'\U00064321' will be truncated to  0x4321.
The issue can be seen using the RL78 backend which I announced a while ago 
(https://lists.llvm.org/pipermail/llvm-dev/2020-April/140546.html) and I'm 
ready to upstream.
Upstream, the problem can be observed on MSP430, however this patch is not 
sufficient in case of MSP430 since Char32Type is left to the default type 
UnsignedInt which is 16 bit in case of MSP430 (set in TargetInfo.cpp). On RL78 
I set it to UnsignedLong just like in case of AVR (see AVR.h).

Regarding testing, I found the problem using the following test from the GCC 
regression:
gcc/testsuite/g++.dg/ext/utf32-1.C
I'm happy to write a new test if I can get any pointers where and how to write 
it (the test fails at execution so not sure how to test it without executing 
it).


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D127363

Files:
  clang/lib/Lex/LiteralSupport.cpp


Index: clang/lib/Lex/LiteralSupport.cpp
===================================================================
--- clang/lib/Lex/LiteralSupport.cpp
+++ clang/lib/Lex/LiteralSupport.cpp
@@ -1597,7 +1597,7 @@
     IsMultiChar = false;
   }
 
-  llvm::APInt LitVal(PP.getTargetInfo().getIntWidth(), 0);
+  llvm::APInt LitVal(PP.getTargetInfo().getChar32Width(), 0);
 
   // Narrow character literals act as though their value is concatenated
   // in this implementation, but warn on overflow.


Index: clang/lib/Lex/LiteralSupport.cpp
===================================================================
--- clang/lib/Lex/LiteralSupport.cpp
+++ clang/lib/Lex/LiteralSupport.cpp
@@ -1597,7 +1597,7 @@
     IsMultiChar = false;
   }
 
-  llvm::APInt LitVal(PP.getTargetInfo().getIntWidth(), 0);
+  llvm::APInt LitVal(PP.getTargetInfo().getChar32Width(), 0);
 
   // Narrow character literals act as though their value is concatenated
   // in this implementation, but warn on overflow.
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to