https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118281
Bug ID: 118281 Summary: Characters and universal character names that are not valid in identifiers are incorrectly rejected before preprocessing Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: luigighiron at gmail dot com Target Milestone: --- C allows using characters (to be clear, character means a unicode scalar value) or universal character names that would not be valid to start a normal token to be their own special token: > The categories of preprocessing tokens are: header names, identifiers, > preprocessing numbers, character constants, string literals, punctuators, and > both single universal character names as well as single non-white-space > characters that do not lexically match the other preprocessing token > categories. Section 6.4 "Lexical elements" Paragraph 3 ISO/IEC 9899:2024 For example, given a macro such as #define STR(X)#X it is valid to do STR(@) even though @ otherwise wouldn't be a valid preprocessing token. However, the following program is incorrectly rejected: #include<stdio.h> #define STR(X)#X int main(){ puts(STR(̈)); } There is a combining diaresis character U+0308 there which cannot start an identifier, so the invocation of STR should create a string literal that contains a combining diaresis character. GCC will also incorrectly reject this if the combining diaresis is replaced with \u0308. Furthermore, GCC even rejects universal character names that name characters that would be valid when used directly: #include<stdio.h> #define STR(X)#X int main(){ puts(STR(\u263A)); } This program is incorrectly rejected, even though replacing \u263A with ☺ (the character it names) results in GCC accepting the program.