[Bug c/67224] UTF-8 support for identifier names in GCC

neilb at protonmail dot ch via Gcc-bugs Wed, 16 Apr 2025 22:59:20 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67224


Neil Booth <neilb at protonmail dot ch> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |neilb at protonmail dot ch

--- Comment #38 from Neil Booth <neilb at protonmail dot ch> ---
@jsm, I'm curious about your statement:
"You need to test cases such as that if a macro is defined twice, once with 
a UCN in its expansion and once with the equivalent character written in 
UTF-8, the difference in the expansion is diagnosed (whichever of all the 
valid UCNs for that character is the one used)."

My reading of the standards is that a UCN names a character.  A spelling is a
sequence of characters.  Hence there is no difference in spelling between a UCN
naming, say, an emoji and that emoji in the source - the spelling of both is a
single character.

This is clear in the wording of the C++ standards.  e.g. C++23 says "The
universal-character-name construct provides a way to name other characters."
where is is referring to characters in the translation character set. The
wording in the C standards is a little ambiguous but I would be surprised if
the intent were different.  After all, there is nothing to be gained by
preserving source code form differences in the preprocessor or compiler - form
differences can only be distinguished when stringized, and there a UCN and the
actual character are indeed the same (and IMO always were).

Clang seems to do a better job in its UCN implementation because it treats a
UCN and the character in names as the same in all ways.

[Bug c/67224] UTF-8 support for identifier names in GCC

Reply via email to