[PATCH] D119221: [clang][lexer] Allow u8 character literal prefixes in C2x

Tom Honermann via Phabricator via cfe-commits Fri, 11 Feb 2022 19:11:35 -0800

tahonermann added inline comments.


================
Comment at: clang/lib/Lex/Lexer.cpp:3462
 
-  case 'u':   // Identifier (uber) or C11/C++11 UTF-8 or UTF-16 string literal
+  case 'u': // Identifier (uber) or C11/C2x/C++11 UTF-8 or UTF-16 string 
literal
     // Notify MIOpt that we read a non-whitespace/non-comment token.
----------------
The comment is slightly misleading both before and after this change. Assuming 
this level of detail is desired, I suggest:
  // Identifer (e.g., uber), or
  // UTF-8 (C2x/C++17) or UTF-16 (C11/C++11) character literal, or
  // UTF-8 or UTF-16 string literal (C11/C++11).
  case 'u':


================
Comment at: clang/test/Lexer/utf8-char-literal.cpp:23
+char f = u8'ab';            // expected-error {{Unicode character literals may 
not contain multiple characters}}
+char g = u8'\x80';          // expected-warning {{implicit conversion from 
'int' to 'char' changes value from 128 to -128}}
 #endif
----------------
aaron.ballman wrote:
> One more test I'd like to see added, just to make sure we're covering 
> 6.4.4.4p9 properly:
> ```
> _Static_assert(
>   _Generic(u8'a',
>            default: 0,
>            unsigned char : 1),
>   "Surprise!");  
> ```
> We expect the type of a u8 character literal to be `unsigned char` at the 
> moment, which is different from a u8 string literal, which uses `char`.
> 
> However, WG14 is also going to be considering 
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm for C2x at our 
> meeting next week.
Good suggestion. I believe the following update will be needed 
to`Sema::ActOnCharacterConstant()` in `clang/lib/Sema/SemaExpr.cpp`:
  ...
  else if (Literal.isUTF8() && getLangOpts().C2x)
    Ty = Context.UnsignedCharTy; // u8'x' -> unsigned char in c2x.
  else if Literal.isUTF8() && getLangOpts().Char8)
    Ty = Context.Char8Ty; // u8'x' -> char8_t when it exists.
  ...



================
Comment at: clang/test/Lexer/utf8-char-literal.cpp:24
+char g = u8'\x80';          // expected-warning {{implicit conversion from 
'int' to 'char' changes value from 128 to -128}}
 #endif
----------------
We should also exercise the preprocessor with something like this:
  #if u8'\xff' != 0xff
  #error uh oh
  #endif

Hmm, this currently fails for C++20 for both Clang and gcc unless 
`-funsigned-char` is passed. That seems wrong. https://godbolt.org/z/Tb7z85ToG. 
MSVC gets this wrong too, but I think for a different reason; see the 
implementation impact section of [[ https://wg21.link/p2029 | P2029 ]] if 
curious.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119221/new/

https://reviews.llvm.org/D119221

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D119221: [clang][lexer] Allow u8 character literal prefixes in C2x

Reply via email to