[Bug c/118281] New: Characters and universal character names that are not valid in identifiers are incorrectly rejected before preprocessing

luigighiron at gmail dot com via Gcc-bugs Thu, 02 Jan 2025 21:27:23 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118281


            Bug ID: 118281
           Summary: Characters and universal character names that are not
                    valid in identifiers are incorrectly rejected before
                    preprocessing
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: luigighiron at gmail dot com
  Target Milestone: ---

C allows using characters (to be clear, character means a unicode scalar value)
or universal character names that would not be valid to start a normal token to
be their own special token:

> The categories of preprocessing tokens are: header names, identifiers,
> preprocessing numbers, character constants, string literals, punctuators, and
> both single universal character names as well as single non-white-space
> characters that do not lexically match the other preprocessing token
> categories.
Section 6.4 "Lexical elements" Paragraph 3 ISO/IEC 9899:2024

For example, given a macro such as #define STR(X)#X it is valid to do STR(@)
even though @ otherwise wouldn't be a valid preprocessing token. However, the
following program is incorrectly rejected:

#include<stdio.h>
#define STR(X)#X
int main(){
    puts(STR(̈));
}

There is a combining diaresis character U+0308 there which cannot start an
identifier, so the invocation of STR should create a string literal that
contains a combining diaresis character. GCC will also incorrectly reject this
if the combining diaresis is replaced with \u0308. Furthermore, GCC even
rejects universal character names that name characters that would be valid when
used directly:

#include<stdio.h>
#define STR(X)#X
int main(){
    puts(STR(\u263A));
}

This program is incorrectly rejected, even though replacing \u263A with ☺ (the
character it names) results in GCC accepting the program.

[Bug c/118281] New: Characters and universal character names that are not valid in identifiers are incorrectly rejected before preprocessing

Reply via email to