[Bug c++/91755] New: C++ handling of extended characters is not 100% correct

lhyatt at gmail dot com Thu, 12 Sep 2019 07:39:40 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91755


            Bug ID: 91755
           Summary: C++ handling of extended characters is not 100%
                    correct
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: lhyatt at gmail dot com
  Target Milestone: ---

In C++, technically extended characters (e.g. UTF-8) in the source are supposed
to be converted to UCN escapes during translation phase 1. Thereafter it should
not be detectable whether a UCN or the character itself was used (except in raw
string literals where the conversion is reverted). GCC does not do this
transformation. The distinction is not visible in too many places, but one such
is in preprocessor stringizing.

For instance:

==========
#define stringize(x) #x
static_assert(sizeof(stringize("π")) == sizeof(stringize("\U000003C0")),
"oops");
==========

The above assert should not fire per the letter of the standard, but it does.

I am not sure if it is necessarily desirable to fix this since the existing
behavior seems more intuitive and matches other compilers. But the issue may
become a little more prevalent soon -- as discussed in this thread:
https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00822.html, a patch will be
applied in the near future that enables extended characters in identifiers too.
Similar to the above case, stringizing such an identifier twice will also make
visible the distinction between UCN- and direct-specified extended characters.

In the new tests being added for this patch
(gcc/testsuite/g++.dg/cpp/ucnid-2-utf8.C and
gcc/testsuite/g++.dg/cpp/ucnid-3-utf8.C), we test that stringizing works for
identifiers containing extended characters, but we test the existing behavior,
which is technically not standard-conforming. So in order to memorialize the
state of things, I am filing this bug report so that I can add a reference to
the situation in the new test cases. If GCC behavior changes in the future,
these new tests will fail and should be adapted to match.

[Bug c++/91755] New: C++ handling of extended characters is not 100% correct

Reply via email to