On Wed, 11 Sep 2019, Lewis Hyatt wrote: > things that may be a little surprising. For instance, you can take a > UTF-8 encoded file and insert a backslash line continuation in the > middle of a multibyte sequence, and gcc will happily paste it back > together and then interpret the resulting UTF-8. I think it's > technically OK standardwise since the conversion from extended > characters to the source character set is implementation-defined, but > it's hardly a straightforward definition. It is sort of consistent > with the treatment of undefined behavior with UCN escapes though, > which gcc already permits to be pasted together over a line > continuation. Anyway, should this behavior be documented as well? I
I don't think that peculiarity should be documented. (Whereas accepting arbitrary bytes inside comments and strings by default is arguably actually a feature.) > > gcc/testsuite/g++.dg/cpp/ucnid-2-utf8.C and > > gcc/testsuite/g++.dg/cpp/ucnid-3-utf8.C are testing double stringizing in > > C++, where strictly the results they expect show that GCC does not conform > > to the C++ standard requirement to convert all extended characters to UCNs > > (because C++ does not have the special C rule making it > > implementation-defined whether the \ of a UCN in a string literal is > > doubled when stringizing). > > Thanks, I didn't mean to ignore this point when you made it on the PR > comments, I just wasn't sure what was the best way to handle it. Do > you find it preferable to just add a comment, or should I rather > change the test to look for the standard-confirming output, and make > it an XFAIL? My inclination would be a comment, with reference to a bug filed for this issue in Bugzilla. > Finally, one general question, when I submit these last changes, is it > better to send them as a new patch relative to what I already sent, or > is it better to send the whole thing updated from scratch? Thanks > again. A complete patch that can be applied to trunk is best. -- Joseph S. Myers jos...@codesourcery.com