tahonermann added a comment. Thanks for your continued work on this, Tim! I think this is close. I did spot one issue and added a few other comments.
================ Comment at: clang/lib/Lex/PPExpressions.cpp:417-418 + else if (Literal.isUTF8()) + Val.setIsUnsigned(PP.getLangOpts().CPlusPlus ? PP.getLangOpts().Char8 + : true); + else ---------------- Thanks for breaking the conditions out; that does make this simpler to understand. I don't think this is right yet though. In C++, if `PP.getLangOpts().Char8` is `false`, then signedness is determined by `PP.getLangOpts().CharIsSigned`. Perhaps this: else if (Literal.isUTF8()) { if (PP.getLangOpts().CPlusPlus) Val.setIsUnsigned(PP.getLangOpts().Char8 ? true : !PP.getLangOpts().CharIsSigned); else Val.setIsUnsigned(true); } The test case didn't catch this because `char` is always a signed type for the variations that are exercised. We could add a variant that includes `-funsigned-char`, and then modify the test based on the presence of `__CHAR_UNSIGNED__`, but that might get pretty awkward. ================ Comment at: clang/test/Lexer/utf8-char-literal.cpp:3-4 // RUN: %clang_cc1 -triple x86_64-apple-darwin -std=c2x -x c -fsyntax-only -verify %s -// RUN: %clang_cc1 -triple x86_64-apple-darwin -std=c++1z -fsyntax-only -verify %s +// RUN: %clang_cc1 -triple x86_64-apple-darwin -std=c2x -x c -fsyntax-only -fchar8_t -verify %s +// RUN: %clang_cc1 -triple x86_64-apple-darwin -std=c2x -x c -fsyntax-only -fno-char8_t -verify %s +// RUN: %clang_cc1 -triple x86_64-apple-darwin -std=c++11 -fsyntax-only -verify %s ---------------- Does the `-fchar8_t` option have any effect in C at present? Gcc maintainers are currently not planning to acknowledge that option in C modes since WG14 did not want to add language dialect concerns for C. This is why [[https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm|N2653]] doesn't have wording that includes a feature test macro. The gcc maintainers pushed back on the `_CHAR8_T_SOURCE` macro mentioned in the "Implementation Experience" section. I think Clang should follow suit; attempts to use `-fchar8_t` or `-fno-char8_t` in C modes should be diagnosed; which means that we don't have to exercise these options with C2x. ================ Comment at: clang/test/Lexer/utf8-char-literal.cpp:7-9 +// RUN: %clang_cc1 -triple x86_64-apple-darwin -std=c++17 -fsyntax-only -fchar8_t -DCHAR8_T -verify %s +// RUN: %clang_cc1 -triple x86_64-apple-darwin -std=c++20 -fsyntax-only -verify %s +// RUN: %clang_cc1 -triple x86_64-apple-darwin -std=c++20 -fsyntax-only -fno-char8_t -DNO_CHAR8_T -verify %s ---------------- Rather than adding your own `CHAR8_T` and `NO_CHAR8_T` macros, you can use the predefined `__cpp_char8_t` feature test macro. ================ Comment at: clang/test/Lexer/utf8-char-literal.cpp:37-47 +#if __cplusplus == 201703L +# if defined(CHAR8_T) +# if u8'\xff' == '\xff' // expected-warning {{right side of operator converted from negative value to unsigned}} +# error Something's not right. +# endif +# else +# if u8'\xff' != '\xff' ---------------- aaron.ballman wrote: > The equality operators seem backwards to what @tahonermann was saying -- I > read his comment as: > > C++17/14/11: u8'\xff' == '\xff' > C++17/14/11, -fchar8_t: u8'\xff' != '\xff' > C++20 and up: u8'\xff' != '\xff' > C++20 and up, -fno-char8_t: u8'\xff' == '\xff' > > Hopefully Tom can clarify if I misunderstood. Yes, that looks right (as long as the target has a signed `char` type). CHANGES SINCE LAST ACTION https://reviews.llvm.org/D124996/new/ https://reviews.llvm.org/D124996 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits