[PATCH] D119221: [clang][lexer] Allow u8 character literal prefixes in C2x

Aaron Ballman via Phabricator via cfe-commits Tue, 05 Apr 2022 14:12:51 -0700

aaron.ballman added inline comments.


================
Comment at: clang/lib/Lex/Lexer.cpp:3462
 
-  case 'u':   // Identifier (uber) or C11/C++11 UTF-8 or UTF-16 string literal
+  // Identifer (e.g., uber), or
+  // UTF-8 (C2x/C++17) or UTF-16 (C11/C++11) character literal, or
----------------



================
Comment at: clang/test/Lexer/utf8-char-literal.cpp:3
 // RUN: %clang_cc1 -triple x86_64-apple-darwin -std=c11 -x c -fsyntax-only 
-verify %s
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -std=c2x -DC2X -x c 
-fsyntax-only -verify %s
 // RUN: %clang_cc1 -triple x86_64-apple-darwin -std=c++1z -fsyntax-only 
-verify %s
----------------
no need for the new `-D`


================
Comment at: clang/test/Lexer/utf8-char-literal.cpp:16
 char f = u8'ab'; // expected-error {{Unicode character literals may not 
contain multiple characters}}
+#elif defined(C2X)
+char a = u8'ñ';      // expected-error {{character too large for enclosing 
character literal type}}
----------------
You can test with `__STDC_VERSION__ > 202000L`.


================
Comment at: clang/test/Lexer/utf8-char-literal.cpp:23
+char f = u8'ab';            // expected-error {{Unicode character literals may 
not contain multiple characters}}
+char g = u8'\x80';          // expected-warning {{implicit conversion from 
'int' to 'char' changes value from 128 to -128}}
 #endif
----------------
tahonermann wrote:
> aaron.ballman wrote:
> > One more test I'd like to see added, just to make sure we're covering 
> > 6.4.4.4p9 properly:
> > ```
> > _Static_assert(
> >   _Generic(u8'a',
> >            default: 0,
> >            unsigned char : 1),
> >   "Surprise!");  
> > ```
> > We expect the type of a u8 character literal to be `unsigned char` at the 
> > moment, which is different from a u8 string literal, which uses `char`.
> > 
> > However, WG14 is also going to be considering 
> > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm for C2x at our 
> > meeting next week.
> Good suggestion. I believe the following update will be needed 
> to`Sema::ActOnCharacterConstant()` in `clang/lib/Sema/SemaExpr.cpp`:
>   ...
>   else if (Literal.isUTF8() && getLangOpts().C2x)
>     Ty = Context.UnsignedCharTy; // u8'x' -> unsigned char in c2x.
>   else if Literal.isUTF8() && getLangOpts().Char8)
>     Ty = Context.Char8Ty; // u8'x' -> char8_t when it exists.
>   ...
> 
> However, WG14 is also going to be considering 
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm for C2x at our 
> meeting next week.

I have an update on this. We discussed the paper and took a straw poll:
```
Does WG14 wish to adopt N2653 in C23? 18/0/2 (consensus)
```
So we should make sure that we all agree this patch is in line with the changes 
from that paper. I believe your changes agree, but it'd be nice for 
@tahonermann to confirm.


================
Comment at: clang/test/Lexer/utf8-char-literal.cpp:24
+char g = u8'\x80';          // expected-warning {{implicit conversion from 
'int' to 'char' changes value from 128 to -128}}
 #endif
----------------
tbaeder wrote:
> tahonermann wrote:
> > We should also exercise the preprocessor with something like this:
> >   #if u8'\xff' != 0xff
> >   #error uh oh
> >   #endif
> > 
> > Hmm, this currently fails for C++20 for both Clang and gcc unless 
> > `-funsigned-char` is passed. That seems wrong. 
> > https://godbolt.org/z/Tb7z85ToG. MSVC gets this wrong too, but I think for 
> > a different reason; see the implementation impact section of [[ 
> > https://wg21.link/p2029 | P2029 ]] if curious.
> This also fails in C2x.
I don't think we need to fix the preprocessor behavior as part of this patch, 
but it would be good to file an issue for this so we know to track it down at 
some point.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119221/new/

https://reviews.llvm.org/D119221

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D119221: [clang][lexer] Allow u8 character literal prefixes in C2x

Reply via email to