Command line options, in particular -D options, should be interpreted in
the locale character set (maybe subject to -finput-charset override).
Instead, the expansion of a -D option is not subject to character set
translation at present.
Consider the program
char *s = S;
compiled with the following command with LC_CTYPE=en_GB.ISO-8859-1
gcc -S -finput-charset=ISO-8859-1 -fexec-charset=UTF-8 -DS=\"�\" t.c
- the string in the output program consists of a single byte rather than
being translated to UTF-8. But the similar program, encoded in
ISO-8859-1
char *s = "�";
compiled with the same options, in the same locale, has a properly UTF-8
string in the assembly output.
If we get extended identifiers (bug 9449) then the same will apply to the
macro names and parameter names in -D and -U options, not just their expansions.
I think the -D and -U arguments should just have the same character set
translations applied as are done to source files - including for C++,
when it is implemented for source files, the conversion of extended characters
to UCNs in phase 1.
--
Summary: -D option handling doesn't account for character sets
Product: gcc
Version: 4.0.0
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: preprocessor
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: jsm28 at gcc dot gnu dot org
CC: gcc-bugs at gcc dot gnu dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20183