On Tue, Sep 10, 2019 at 11:47:22PM +0000, Joseph Myers wrote: > On Mon, 12 Aug 2019, Lewis Hyatt wrote: > > > Hello- > > > > The attached patch for libcpp adds support for extended characters (e.g. > > UTF-8) > > in identifiers. A preliminary version of the patch was posted on PR c/67224 > > as > > Comment 26 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67224#c26) and > > discussed with Joseph Myers. Here is an updated patch incorporating all > > feedback received so far. I hope it is suitable now; please let me know if I > > can do anything else to make it ready for you to apply. I am happy to work > > on > > it further, whatever is needed. I can't easily test on anything other than > > x86_64-linux though. I did bootstrap all languages and run all tests on that > > platform, everything was good. > > > > The (relatively short) changes to libcpp are included inline here. I > > attached > > the test cases as a gzipped patch to avoid any problems with the encoding > > (the > > test cases contain some invalid UTF-8 and also other encodings such as > > latin-1 > > as part of the testing). > > > > Thanks for taking a look at it! > > Thanks, I think this is OK with a few updates to the documentation.
Attached is a single patch relative to current trunk that incorporates all of your feedback. I gzipped it like last time just in case the invalid UTF-8 in the tests presents a problem. The code changes are the same as before other than comments. The documentation is now updated... there were a couple other places that also seemed reasonable to me to update, hope it sounds OK. I also created the PR about UCN conversion (PR 91755) and added a reference in the comments for those tests. Bootstrap was done on Linux x86-64, testing results: before patch: 36 XPASS 72 FAIL 1452 XFAIL 9624 UNSUPPORTED 359529 PASS after patch: 36 XPASS 72 FAIL 1452 XFAIL 9624 UNSUPPORTED 359633 PASS Thank you. -Lewis
libcpp/ChangeLog 2019-09-12 Lewis Hyatt <lhy...@gmail.com> PR c/67224 * charset.c (_cpp_valid_utf8): New function to help lex UTF-8 tokens. * internal.h (_cpp_valid_utf8): Declare. * lex.c (forms_identifier_p): Use it to recognize UTF-8 identifiers. (_cpp_lex_direct): Handle UTF-8 in identifiers and CPP_OTHER tokens. Do all work in "default" case to avoid slowing down typical code paths. Also handle $ and UCN in the default case for consistency. gcc/Changelog 2019-09-12 Lewis Hyatt <lhy...@gmail.com> PR c/67224 * doc/cpp.texi: Document support for extended characters in identifiers. * doc/cppopts.texi: Likewise. gcc/testsuite/ChangeLog 2019-09-12 Lewis Hyatt <lhy...@gmail.com> PR c/67224 * c-c++-common/cpp/ucnid-2011-1-utf8.c: New test. * g++.dg/cpp/ucnid-1-utf8.C: New test. * g++.dg/cpp/ucnid-2-utf8.C: New test. * g++.dg/cpp/ucnid-3-utf8.C: New test. * g++.dg/cpp/ucnid-4-utf8.C: New test. * g++.dg/other/ucnid-1-utf8.C: New test. * gcc.dg/cpp/ucnid-1-utf8.c: New test. * gcc.dg/cpp/ucnid-10-utf8.c: New test. * gcc.dg/cpp/ucnid-11-utf8.c: New test. * gcc.dg/cpp/ucnid-12-utf8.c: New test. * gcc.dg/cpp/ucnid-13-utf8.c: New test. * gcc.dg/cpp/ucnid-14-utf8.c: New test. * gcc.dg/cpp/ucnid-15-utf8.c: New test. * gcc.dg/cpp/ucnid-2-utf8.c: New test. * gcc.dg/cpp/ucnid-3-utf8.c: New test. * gcc.dg/cpp/ucnid-4-utf8.c: New test. * gcc.dg/cpp/ucnid-6-utf8.c: New test. * gcc.dg/cpp/ucnid-7-utf8.c: New test. * gcc.dg/cpp/ucnid-9-utf8.c: New test. * gcc.dg/ucnid-1-utf8.c: New test. * gcc.dg/ucnid-10-utf8.c: New test. * gcc.dg/ucnid-11-utf8.c: New test. * gcc.dg/ucnid-12-utf8.c: New test. * gcc.dg/ucnid-13-utf8.c: New test. * gcc.dg/ucnid-14-utf8.c: New test. * gcc.dg/ucnid-15-utf8.c: New test. * gcc.dg/ucnid-16-utf8.c: New test. * gcc.dg/ucnid-2-utf8.c: New test. * gcc.dg/ucnid-3-utf8.c: New test. * gcc.dg/ucnid-4-utf8.c: New test. * gcc.dg/ucnid-5-utf8.c: New test. * gcc.dg/ucnid-6-utf8.c: New test. * gcc.dg/ucnid-7-utf8.c: New test. * gcc.dg/ucnid-8-utf8.c: New test. * gcc.dg/ucnid-9-utf8.c: New test.
utf8-identifiers-v2.patch.gz
Description: application/gunzip