On Mon, Aug 16, 2021 at 07:45:55PM -0400, Jason Merrill wrote: > On 8/16/21 4:51 PM, Jakub Jelinek wrote: > > On Mon, Aug 16, 2021 at 04:21:00PM -0400, Jason Merrill wrote: > > > > I see for the UTF-8 chars we have: > > > > switch (ucn_valid_in_identifier (pfile, *cp, nst)) > > > > { > > > > case 0: > > > > /* In C++, this is an error for invalid character in an > > > > identifier > > > > because logically, the UTF-8 was converted to a UCN > > > > during > > > > translation phase 1 (even though we don't physically do > > > > it that > > > > way). In C, this byte rather becomes grammatically a > > > > separate > > > > token. */ > > > > if (CPP_OPTION (pfile, cplusplus)) > > > > cpp_error (pfile, CPP_DL_ERROR, > > > > "extended character %.*s is not valid in an > > > > identifier", > > > > (int) (*pstr - base), base); > > > > else > > > > { > > > > *pstr = base; > > > > return false; > > > > } > > > > So, shall we behave the same as C for cxx23_identifiers here? And > > > > shall we > > > > do something similar for the UCNs in \uxxxx and \Uxxxxxxxx forms? > > > > Confused... > > > > > > I tend to agree with Joseph's comment on your followup patch about this > > > issue; do you? > > > > It isn't clear to me if it is ok that it is an error even with just -E, > > i.e. whether > > "If a single universal-character-name does not match any of the other > > preprocessing token categories, the program is ill-formed." > > applies already in translation phase 4 which is what -E emits (or some other > > one?), or only in phase 7 when converting preprocessing tokens to tokens. > > I read it as applying in phase 3.
Ok, follow-up patch withdrawn. Jakub