aaron.ballman created this revision.
aaron.ballman added reviewers: tahonermann, cor3ntin, clang-language-wg.
Herald added a project: All.
aaron.ballman requested review of this revision.
Herald added a project: clang.

WG21 P1949 <https://reviews.llvm.org/P1949> and WG14 N2836 were both adopted 
without any deprecation period for users who were making use of now-invalid 
characters in their identifiers. This caused some amount of pain for people, 
especially folks using mathematical symbols in their identifiers, which makes 
it hard to upgrade to newer Clang versions with this diagnostic as an error.

This patch downgrades the error to be a warning which defaults to an error. 
This continues to signal to users that the identifiers are in fact invalid, but 
it gives a grace period for people to update their code bases to the new rules 
(or, alternatively, time for the Unicode consortium to determine whether some 
of these characters should be allowed in an identifier in the immutable set). I 
am tentatively documenting that grace period as being until we ship Clang 18 
(which should give users a year or two to migrate their code).

Note, because we implemented the UAX31 rules in Clang 14, I would like to try 
to land this patch and backport it to Clang 15 so that we avoid having *two* 
releases with no easy migration path for this subset of users. I think this is 
reasonable because we're not altering the implementation logic at all, just 
modifying what diagnostic kind is emitted by default. If someone has a concern 
about wanting to cherry pick this so late in the cycle, please share those 
concerns ASAP.

Fixes #54732


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D132877

Files:
  clang/docs/ReleaseNotes.rst
  clang/include/clang/Basic/DiagnosticLexKinds.td
  clang/test/Preprocessor/utf8-allowed-chars.c

Index: clang/test/Preprocessor/utf8-allowed-chars.c
===================================================================
--- clang/test/Preprocessor/utf8-allowed-chars.c
+++ clang/test/Preprocessor/utf8-allowed-chars.c
@@ -1,58 +1,32 @@
-// RUN: %clang_cc1 %s -fsyntax-only -std=c99 -verify
-// RUN: %clang_cc1 %s -fsyntax-only -std=c11 -Wc99-compat -verify
-// RUN: %clang_cc1 %s -fsyntax-only -x c++ -std=c++03 -Wc++11-compat -verify
-// RUN: %clang_cc1 %s -fsyntax-only -x c++ -std=c++11 -Wc++98-compat -verify
+// RUN: %clang_cc1 %s -fsyntax-only -std=c99 -verify=expected,c99
+// RUN: %clang_cc1 %s -fsyntax-only -std=c11 -Wc99-compat -verify=expected,c11
+// RUN: %clang_cc1 %s -fsyntax-only -x c++ -std=c++03 -Wc++11-compat -verify=expected,cpp
+// RUN: %clang_cc1 %s -fsyntax-only -x c++ -std=c++11 -Wc++98-compat -verify=expected,cpp
+// RUN: %clang_cc1 %s -fsyntax-only -Wno-invalid-identifier-character -std=c2x -verify=off
+// RUN: %clang_cc1 %s -fsyntax-only -Wno-invalid-identifier-character -x c++ -std=c++2b -verify=off
 
 // Note: This file contains Unicode characters; please do not remove them!
 
 // Identifier characters
-extern char aǶ; // C11, C++11
+extern char aǶ; // c11-warning {{using this character in an identifier is incompatible with C99}} \
+                   c99-error {{not allowed in an identifier}}
 extern char aª; // C99, C11, C++11
-extern char a΄; // C++03, C11, C++11
+extern char a΄; // cpp-error {{not allowed in an identifier}} \
+                   c11-warning {{using this character in an identifier is incompatible with C99}} \
+                   c99-error {{not allowed in an identifier}}
 extern char a๐; // C99, C++03, C11, C++11
-extern char a﹅; // none
-extern char x̀; // C11, C++11. Note that this does not have a composed form.
-
-
+extern char a﹅; // expected-error {{not allowed in an identifier}}
 
+// Note that this does not have a composed form.
+extern char x̀; // c11-warning {{using this character in an identifier is incompatible with C99}} \
+                   c99-error {{not allowed in an identifier}}
 
 // Identifier initial characters
-extern char ๐; // C++03, C11, C++11
-extern char ̀; // disallowed initially in C11/C++, always in C99
-
-
-
-
-#if __cplusplus
-// expected-error@11 {{not allowed in an identifier}}
-// expected-error@13 {{not allowed in an identifier}}
-// expected-error@20 {{character <U+0E50> not allowed at the start of an identifier}}
-// expected-error@21 {{character <U+0300> not allowed at the start of an identifier}}
-// expected-warning@20 {{declaration does not declare anything}}
-// expected-warning@21 {{declaration does not declare anything}}
-
-#else
-# if __STDC_VERSION__ >= 201112L
-// C11
-// expected-warning@9 {{using this character in an identifier is incompatible with C99}}
-// expected-warning@11 {{using this character in an identifier is incompatible with C99}}
-// expected-error@13 {{not allowed in an identifier}}
-// expected-warning@14 {{using this character in an identifier is incompatible with C99}}
-// expected-warning@20 {{starting an identifier with this character is incompatible with C99}}
-// expected-warning@21 {{declaration does not declare anything}}
-// expected-error@21 {{not allowed at the start of an identifier}}
-
-# else
-// C99
-// expected-error@9 {{not allowed in an identifier}}
-// expected-error@11 {{not allowed in an identifier}}
-// expected-error@13 {{not allowed in an identifier}}
-// expected-error@14 {{not allowed in an identifier}}
-// expected-error@20 {{character <U+0E50> not allowed at the start of an identifier}}
-// expected-error@21 {{unexpected character <U+0300>}}
-// expected-warning@20 {{declaration does not declare anything}}
-// expected-warning@21 {{declaration does not declare anything}}
-
-
-# endif
-#endif
+extern char ๐a; // cpp-error {{character <U+0E50> not allowed at the start of an identifier}} \
+                   c11-warning {{starting an identifier with this character is incompatible with C99}} \
+                   c99-error {{character <U+0E50> not allowed at the start of an identifier}}
+extern char ̀; // cpp-error {{character <U+0300> not allowed at the start of an identifier}} \
+                 c11-error {{character <U+0300> not allowed at the start of an identifier}} \
+                 c99-error {{unexpected character <U+0300>}} \
+                 expected-warning {{declaration does not declare anything}} \
+                 off-warning {{declaration does not declare anything}}
Index: clang/include/clang/Basic/DiagnosticLexKinds.td
===================================================================
--- clang/include/clang/Basic/DiagnosticLexKinds.td
+++ clang/include/clang/Basic/DiagnosticLexKinds.td
@@ -121,8 +121,13 @@
   "invalid UTF-8 in comment">, InGroup<DiagGroup<"invalid-utf8">>;
 def err_character_not_allowed : Error<
   "unexpected character <U+%0>">;
-def err_character_not_allowed_identifier : Error<
-  "character <U+%0> not allowed %select{in|at the start of}1 an identifier">;
+// This warning defaults to an error because there was no deprecation period
+// in either C or C++ for invalid UAX31 characters; we'll let users downgrade
+// this error for a few releases, but we anticipate turning this back into a
+// hard error by Clang 18.
+def err_character_not_allowed_identifier : Warning<
+  "character <U+%0> not allowed %select{in|at the start of}1 an identifier">,
+  InGroup<DiagGroup<"invalid-identifier-character">>, DefaultError;
 def ext_unicode_whitespace : ExtWarn<
   "treating Unicode character as whitespace">,
   InGroup<DiagGroup<"unicode-whitespace">>;
Index: clang/docs/ReleaseNotes.rst
===================================================================
--- clang/docs/ReleaseNotes.rst
+++ clang/docs/ReleaseNotes.rst
@@ -116,6 +116,16 @@
 - Correctly diagnose a future keyword if it exist as a keyword in the higher
   language version and specifies in which version it will be a keyword. This
   supports both c and c++ language.
+- Added the ``-Winvalid-identifier-character`` warning group to identify
+  identifier characters which are invalid according to UAX31 but were
+  previously allowed. This warning defaults to an error because these
+  characters are invalid according to both the C and C++ language
+  specifications, but because there was no deprecation period for those
+  characters, we allow users to downgrade the error back into a warning with
+  ``-Wno-error=invalid-identifier-character``. It is possible to disable the
+  warning entirely with ``-Wno-invalid-identifier-character`` but this is not
+  advised because we intend to turn the warning back into a hard error in Clang
+  18 after giving users a chance to update their code.
 
 Non-comprehensive list of changes in this release
 -------------------------------------------------
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to