------- Comment #2 from kkylheku at gmail dot com 2009-01-28 16:30 ------- (In reply to comment #1) > Confirmed.
Thanks. By the way, I started looking at patching this. My suspicions were confirmed that this is a case of pasting together invalid tokens. The compiler sees the tokens individually, because it's closely integrated with the preprocessor. But when the tokens are converted to text, they resemble a valid string literal. The embedded newlines are gone. What's happening is that the input: "Hello,\n %s!",\n "world"); is being tokenized like this: {"Hello,}{%}{s}{",}{"world"}{)} The "Hello, and ", are assigned the special lexical category CPP_OTHER, because they are improper tokens. Of course % is an operator and s is a CPP_NAME identifier. Also note how everything becomes one argument to the macro, since the comma is never seen as a independent token. A possible way to fix this bug would be in the function lex_string to not back up over the \n that is found in the middle of a string literal, so that the newline becomes part of the CPP_OTHER token. This behavior might have to be language-dependent, though. It looks like assembly language programs may be depending on the current behavior, hence this test in lex_string: if (type == CPP_OTHER && CPP_OPTION (pfile, lang) != CLK_ASM) cpp_error (pfile, CPP_DL_PEDWARN, "missing terminating %c character", (int) terminator); Or maybe CPP_OTHER tokens should never be pasted together with anything that follows them because even inserting a space is not good enough; maybe a newline should be emitted between CPP_OTHER and the next token instead of a space, if the language is other than CLK_ASM. Will experiment. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38990