tahonermann added inline comments.
================ Comment at: clang/lib/Lex/Lexer.cpp:3462 - case 'u': // Identifier (uber) or C11/C++11 UTF-8 or UTF-16 string literal + case 'u': // Identifier (uber) or C11/C2x/C++11 UTF-8 or UTF-16 string literal // Notify MIOpt that we read a non-whitespace/non-comment token. ---------------- The comment is slightly misleading both before and after this change. Assuming this level of detail is desired, I suggest: // Identifer (e.g., uber), or // UTF-8 (C2x/C++17) or UTF-16 (C11/C++11) character literal, or // UTF-8 or UTF-16 string literal (C11/C++11). case 'u': ================ Comment at: clang/test/Lexer/utf8-char-literal.cpp:23 +char f = u8'ab'; // expected-error {{Unicode character literals may not contain multiple characters}} +char g = u8'\x80'; // expected-warning {{implicit conversion from 'int' to 'char' changes value from 128 to -128}} #endif ---------------- aaron.ballman wrote: > One more test I'd like to see added, just to make sure we're covering > 6.4.4.4p9 properly: > ``` > _Static_assert( > _Generic(u8'a', > default: 0, > unsigned char : 1), > "Surprise!"); > ``` > We expect the type of a u8 character literal to be `unsigned char` at the > moment, which is different from a u8 string literal, which uses `char`. > > However, WG14 is also going to be considering > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm for C2x at our > meeting next week. Good suggestion. I believe the following update will be needed to`Sema::ActOnCharacterConstant()` in `clang/lib/Sema/SemaExpr.cpp`: ... else if (Literal.isUTF8() && getLangOpts().C2x) Ty = Context.UnsignedCharTy; // u8'x' -> unsigned char in c2x. else if Literal.isUTF8() && getLangOpts().Char8) Ty = Context.Char8Ty; // u8'x' -> char8_t when it exists. ... ================ Comment at: clang/test/Lexer/utf8-char-literal.cpp:24 +char g = u8'\x80'; // expected-warning {{implicit conversion from 'int' to 'char' changes value from 128 to -128}} #endif ---------------- We should also exercise the preprocessor with something like this: #if u8'\xff' != 0xff #error uh oh #endif Hmm, this currently fails for C++20 for both Clang and gcc unless `-funsigned-char` is passed. That seems wrong. https://godbolt.org/z/Tb7z85ToG. MSVC gets this wrong too, but I think for a different reason; see the implementation impact section of [[ https://wg21.link/p2029 | P2029 ]] if curious. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D119221/new/ https://reviews.llvm.org/D119221 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits