------- Comment #2 from kkylheku at gmail dot com  2009-01-28 16:30 -------
(In reply to comment #1)
> Confirmed.

Thanks. By the way, I started looking at patching this. My suspicions were
confirmed that this is a case of pasting together invalid tokens. The compiler
sees the tokens individually, because it's closely integrated with the
preprocessor. But when the tokens are converted to text, they resemble a valid
string literal. The embedded newlines are gone.

What's happening is that the input:

"Hello,\n      %s!",\n      "world");

is being tokenized like this:

{"Hello,}{%}{s}{",}{"world"}{)}

The "Hello, and ", are assigned the special lexical category CPP_OTHER, because
they are improper tokens. Of course % is an operator and s is a CPP_NAME
identifier.  Also note how everything becomes one argument to the macro, since
the comma is never seen as a independent token.

A possible way to fix this bug would be in the function lex_string to not back
up over the \n that is found in the middle of a string literal, so that the
newline becomes part of the CPP_OTHER token.  This behavior might have to be
language-dependent, though. It looks like assembly language programs may be
depending on the current behavior, hence this test in lex_string:

  if (type == CPP_OTHER && CPP_OPTION (pfile, lang) != CLK_ASM)
    cpp_error (pfile, CPP_DL_PEDWARN, "missing terminating %c character",
               (int) terminator);

Or maybe CPP_OTHER tokens should never be pasted together with anything that
follows them because even inserting a space is not good enough; maybe a newline
should be emitted between CPP_OTHER and the next token instead of a space, if
the language is other than CLK_ASM.

Will experiment.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38990

Reply via email to