On Tue, Sep 10, 2019 at 11:47:22PM +0000, Joseph Myers wrote:
> On Mon, 12 Aug 2019, Lewis Hyatt wrote:
> 
> > Hello-
> > 
> > The attached patch for libcpp adds support for extended characters (e.g. 
> > UTF-8)
> > in identifiers. A preliminary version of the patch was posted on PR c/67224 
> > as
> > Comment 26 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67224#c26) and
> > discussed with Joseph Myers. Here is an updated patch incorporating all
> > feedback received so far. I hope it is suitable now; please let me know if I
> > can do anything else to make it ready for you to apply. I am happy to work 
> > on
> > it further, whatever is needed. I can't easily test on anything other than
> > x86_64-linux though. I did bootstrap all languages and run all tests on that
> > platform, everything was good.
> > 
> > The (relatively short) changes to libcpp are included inline here. I 
> > attached
> > the test cases as a gzipped patch to avoid any problems with the encoding 
> > (the
> > test cases contain some invalid UTF-8 and also other encodings such as 
> > latin-1
> > as part of the testing).
> > 
> > Thanks for taking a look at it!
> 
> Thanks, I think this is OK with a few updates to the documentation.

Attached is a single patch relative to current trunk that incorporates all of
your feedback. I gzipped it like last time just in case the invalid UTF-8 in
the tests presents a problem. The code changes are the same as before other
than comments.

The documentation is now updated... there were a couple other places that also
seemed reasonable to me to update, hope it sounds OK.

I also created the PR about UCN conversion (PR 91755) and added a reference in
the comments for those tests.

Bootstrap was done on Linux x86-64, testing results:

before patch:
36 XPASS
72 FAIL
1452 XFAIL
9624 UNSUPPORTED
359529 PASS

after patch:
36 XPASS
72 FAIL
1452 XFAIL
9624 UNSUPPORTED
359633 PASS

Thank you.

-Lewis
libcpp/ChangeLog
2019-09-12  Lewis Hyatt  <lhy...@gmail.com>

        PR c/67224
        * charset.c (_cpp_valid_utf8): New function to help lex UTF-8 tokens.
        * internal.h (_cpp_valid_utf8): Declare.
        * lex.c (forms_identifier_p): Use it to recognize UTF-8 identifiers.
        (_cpp_lex_direct): Handle UTF-8 in identifiers and CPP_OTHER tokens.
        Do all work in "default" case to avoid slowing down typical code paths.
        Also handle $ and UCN in the default case for consistency.

gcc/Changelog
2019-09-12  Lewis Hyatt  <lhy...@gmail.com>

        PR c/67224
        * doc/cpp.texi: Document support for extended characters in
        identifiers.
        * doc/cppopts.texi: Likewise.

gcc/testsuite/ChangeLog
2019-09-12  Lewis Hyatt  <lhy...@gmail.com>

        PR c/67224
        * c-c++-common/cpp/ucnid-2011-1-utf8.c: New test.
        * g++.dg/cpp/ucnid-1-utf8.C: New test.
        * g++.dg/cpp/ucnid-2-utf8.C: New test.
        * g++.dg/cpp/ucnid-3-utf8.C: New test.
        * g++.dg/cpp/ucnid-4-utf8.C: New test.
        * g++.dg/other/ucnid-1-utf8.C: New test.
        * gcc.dg/cpp/ucnid-1-utf8.c: New test.
        * gcc.dg/cpp/ucnid-10-utf8.c: New test.
        * gcc.dg/cpp/ucnid-11-utf8.c: New test.
        * gcc.dg/cpp/ucnid-12-utf8.c: New test.
        * gcc.dg/cpp/ucnid-13-utf8.c: New test.
        * gcc.dg/cpp/ucnid-14-utf8.c: New test.
        * gcc.dg/cpp/ucnid-15-utf8.c: New test.
        * gcc.dg/cpp/ucnid-2-utf8.c: New test.
        * gcc.dg/cpp/ucnid-3-utf8.c: New test.
        * gcc.dg/cpp/ucnid-4-utf8.c: New test.
        * gcc.dg/cpp/ucnid-6-utf8.c: New test.
        * gcc.dg/cpp/ucnid-7-utf8.c: New test.
        * gcc.dg/cpp/ucnid-9-utf8.c: New test.
        * gcc.dg/ucnid-1-utf8.c: New test.
        * gcc.dg/ucnid-10-utf8.c: New test.
        * gcc.dg/ucnid-11-utf8.c: New test.
        * gcc.dg/ucnid-12-utf8.c: New test.
        * gcc.dg/ucnid-13-utf8.c: New test.
        * gcc.dg/ucnid-14-utf8.c: New test.
        * gcc.dg/ucnid-15-utf8.c: New test.
        * gcc.dg/ucnid-16-utf8.c: New test.
        * gcc.dg/ucnid-2-utf8.c: New test.
        * gcc.dg/ucnid-3-utf8.c: New test.
        * gcc.dg/ucnid-4-utf8.c: New test.
        * gcc.dg/ucnid-5-utf8.c: New test.
        * gcc.dg/ucnid-6-utf8.c: New test.
        * gcc.dg/ucnid-7-utf8.c: New test.
        * gcc.dg/ucnid-8-utf8.c: New test.
        * gcc.dg/ucnid-9-utf8.c: New test.

Attachment: utf8-identifiers-v2.patch.gz
Description: application/gunzip

Reply via email to