Hi again,
Am 07.06.2012 um 14:07 schrieb Bruno Haible: [...] > >> But this is dangerous, because now UTF-8 is set but MB_CUR_MAX is 1 >> and various parts of sed interpret "Rémi Leblond" as an invalid >> character sequence for a UTF-8 character set. > > Indeed, I can see how this inconsistency leads to bugs like the described > ones. > > The fix could be to have two different locale_charset() functions, > one that returns "US-ASCII" and another one that returns "UTF-8". > The first one to be used when MB_CUR_MAX and mbrtowc() are used as > well, the second one to be used by gettext(). But the separation > line between the two cases is not yet clear to me. Any insights? Hum, that sounds quite complicated -- could you explain what this would gain over the idea of simply mapping "US-ASCII" to "ASCII", or over the patch Paul suggested: > --- a/lib/localcharset.c > +++ b/lib/localcharset.c > @@ -542,5 +542,12 @@ locale_charset (void) > if (codeset[0] == '\0') > codeset = "ASCII"; > > +#ifdef DARWIN7 > + /* MacOS X sets MB_CUR_MAX to 1 when LC_ALL=C, and "UTF-8" > + (the default codeset) does not work when MB_CUR_MAX is 1. */ > + if (strcmp (codeset, "UTF-8") == 0 && MB_CUR_MAX <= 1) > + codeset = "ASCII"; > +#endif > + > return codeset; > } Cheers, Max