https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92987
Bug ID: 92987 Summary: -finput-charset is only usable with encodings that are supersets of ASCII Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: preprocessor Assignee: unassigned at gcc dot gnu.org Reporter: lhyatt at gmail dot com Target Milestone: --- -finput-charset supports converting all encodings supported by iconv, and also UTF-32 and UTF-16 are supported directly with routines in libcpp/charset.c. However, -finput-charset does not seem to actually be usable unless the chosen encoding is a superset of ASCII, because it applies to all header files included from the source as well. Even an empty source file implicitly includes /usr/include/stdc-predef.h, and so there is nothing that can be compiled with say -finput-charset=UTF-32LE: $ echo -n > t.c $ gcc -S -finput-charset=UTF-32LE t.c cc1: error: failure to convert UTF-32LE to UTF-8 The error comes while processing stdc-predef.h. I was about to work on adding support for -finput-charset into diagnostics infrastructure (it currently ignores it), however it seems like this issue should probably be dealt with first, since it may entail adding the notion that different source files have a different input encoding. I am not sure what would be the desired way to address it. Are there use cases where it is desirable that -finput-charset applies to the #includes too (I guess systems could exist where the system headers are not ASCII)? Would it make sense to add a new option that changed the charset only for source files, and not the #includes? Or maybe it should be kept for "..." includes and not for <...> or something like this? -Lewis