Hi again, it would be really, really nice to get this issue resolved, one way or another :-). As mentioned, in the current state of things, GNU sed (via gnulib) does not work correctly on Mac OS X when e.g. LANG=C is set, leading to real-world errors for users when using e.g. git while GNU sed is installed.
On the other hand, so far I saw no reply to my attempts to refute counterarguments against these patches. So, should I just submit git patches for one (or both) of them, for inclusion? Or does anybody still have reservations about this? Cheers, Max Am 11.06.2012 um 00:31 schrieb Max Horn: > Hi again, > > > Am 07.06.2012 um 14:07 schrieb Bruno Haible: > > [...] > >> >>> But this is dangerous, because now UTF-8 is set but MB_CUR_MAX is 1 >>> and various parts of sed interpret "Rémi Leblond" as an invalid >>> character sequence for a UTF-8 character set. >> >> Indeed, I can see how this inconsistency leads to bugs like the described >> ones. >> >> The fix could be to have two different locale_charset() functions, >> one that returns "US-ASCII" and another one that returns "UTF-8". >> The first one to be used when MB_CUR_MAX and mbrtowc() are used as >> well, the second one to be used by gettext(). But the separation >> line between the two cases is not yet clear to me. Any insights? > > Hum, that sounds quite complicated -- could you explain what this would gain > over the idea of simply mapping "US-ASCII" to "ASCII", or over the patch Paul > suggested: > >> --- a/lib/localcharset.c >> +++ b/lib/localcharset.c >> @@ -542,5 +542,12 @@ locale_charset (void) >> if (codeset[0] == '\0') >> codeset = "ASCII"; >> >> +#ifdef DARWIN7 >> + /* MacOS X sets MB_CUR_MAX to 1 when LC_ALL=C, and "UTF-8" >> + (the default codeset) does not work when MB_CUR_MAX is 1. */ >> + if (strcmp (codeset, "UTF-8") == 0 && MB_CUR_MAX <= 1) >> + codeset = "ASCII"; >> +#endif >> + >> return codeset; >> } > > > Cheers, > Max >