[Bug other/28315] gcc doesn't use locale for default input charset

lacos at caesar dot elte.hu Fri, 29 Mar 2013 06:35:32 -0700


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28315




Laszlo Ersek <lacos at caesar dot elte.hu> changed:



           What    |Removed                     |Added

----------------------------------------------------------------------------

                 CC|                            |bonzini at gnu dot org,

                   |                            |lacos at caesar dot elte.hu



--- Comment #1 from Laszlo Ersek <lacos at caesar dot elte.hu> 2013-03-29 
13:17:21 UTC ---

gcc has defaulted to UTF-8 rather than the locale's codeset in

_cpp_default_encoding() [libcpp/charset.c] since the following 2004 hunk:



    http://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=d856c8a6#patch25



(

  The default encoding is selected for both "input_charset" (overrideable

  with -finput-charset) and "narrow_charset" (overrideable with

  -fexec-charset):



    cpp_create_reader() [libcpp/init.c]

      ~ narrow_charset = _cpp_default_encoding()

      ~ input_charset = _cpp_default_encoding()



  The "overrides" are implemented in c_common_handle_option()

  [gcc/c-family/c-opts.c].

)



Considering the encodings of source files "in the wild" that gcc has been

used to compile in the last 8+ years (ie. while the "&& 0" has been in

place):



- UTF-8 (of which 7-bit ASCII is a subset) worked.



- Any non-UTF-8 encoding that utilized the MSB (eg. ISO-8859-2) required the

  -finput-charset option.



  People who would have originally wanted gcc to take that codeset from the

  locale were probably *developing* the source code in question, hence they

  could easily add the -finput-charset to their makefiles.



Much of the world must have migrated to UTF-8-encoded locales by now.

Reverting the "&& 0" would:



- not affect people with such a distro-default locale who build UTF-8 /

  ASCII sources: their locale codeset matches the current hardwired default,



- not affect people building sources with non-UTF-8 8-bit codesets (eg.

  ISO-8859-2), since those projects already have to use the -finput-charset

  options in their makefiles,



- affect people who have stuck to their 7-bit ASCII, or non-UTF-8 8-bit

  codesets in their locales, and compile real UTF-8 sources.



People in the last group (which includes me :)) would be forced to (a)

modify their locale when building such sources as end-users, or (b) to find

out about -finput-charset=UTF-8 and pass it via (b1) Makefile hacking or

(b2) ./configure settings (env vars, or command line options).



I think that's unreasonable; building random projects from the tubes would

break for this small but existent group of users.



Therefore I suggest to keep the logic as-is, and update the docs instead

("gcc/doc/cppopts.texi"): "-finput-charset" should not refer to the locale.

[Bug other/28315] gcc doesn't use locale for default input charset

Reply via email to